SQL-like query language for csv

Command Usage

csvq [options] [subcommand] ["query"|argument]

Execution of Statements

There are three ways to execute a query or statements.

# Pass to the csvq command as an argument
$ csvq 'SELECT id, name FROM users'

# Load from file
$ cat statements.sql
VAR @id := 0;
SELECT @id := @id + 1 AS id,
       name
  FROM users;

$ csvq -s statements.sql

# Execute in the interactive shell
$ csvq
csvq > SELECT id, name FROM users;
+----+-------+
| id | name  |
+----+-------+
| 1  | Louis |
| 2  | Sean  |
+----+-------+
csvq > UPDATE users SET name = 'Mildred' WHERE id = 2;
1 record updated on "/home/mithrandie/docs/csv/users.csv".
csvq > SELECT id, name FROM users;
+----+----------+
| id | name     |
+----+----------+
| 1  | Louis    |
| 2  | Mildred  |
+----+----------+
csvq > COMMIT;
Commit: file "/home/mithrandie/docs/csv/users.csv" is updated.
csvq >

Options

–allow-uneven-fields
Allow loading CSV files with uneven field length.
–ansi-quotes, -k
Use double quotation mark (U+0022 ") as identifier enclosure.
–color, -c
Use ANSI color escape sequences.
–count-diacritical-sign, -S
Count diacritical signs as halfwidth. If not, then that characters are counted as zero-width.
–count-format-code, -A
Count format characters and zero-width spaces as halfwidth. If not, then that characters are counted as zero-width.
–cpu, -p
Hint for the number of cpu cores to be used. The default is the half of the number of cpu cores.
–datetime-format value, -t value
Datetime Format to parse strings.

Format string is the same as the function DATETIME_FORMAT.

This option can be specified multiple formats using JSON array of strings.

–delimiter value, -d value
Field delimiter for CSV. The default is a comma(U+002C ,).

A delimiter must be one character. Special Characters can be used with backslash escaping.

–delimiter-positions value, -m value
Delimiter positions for Fixed-Length Format. The default is “SPACES”.

You can specify “SPACES” or a JSON Array of integers. “SPACES” splits lines automatically by spaces. JSON Array is used to specify delimiter positions explicitly.

Delimiter positions indicate the number of bytes from the start of the line. For example, JSON Array “[5, 10, 15]” splits “1234567890abcde” as “12345, 67890, abcde”

If you want to operate Single-Line Fixed-Length Format, then connect a JSON Array to “S”(U+0053) or “s”(U+0073). For example, “S[2, 3, 6]” imports “01aabc02bdef03cghi” as “(‘01’, ‘a’, ‘abc’), (‘02’, ‘b’, ‘def’), (‘03’, ‘c’, ‘ghi’)”.

–east-asian-encoding, -W
Count ambiguous characters as fullwidth. If not, then that characters are counted as halfwidth.
–enclose-all, -Q
Enclose all string values in CSV.
–encoding value, -e value
File encoding. Following encodings are supported. The default is AUTO.
value(case ignored) character encoding
AUTO Detect file encoding automatically
UTF8 UTF-8. Detect BOM automatically
UTF8M UTF-8 with BOM
UTF16 UTF-16. Detect BOM and Endian automatically
UTF16BE UTF-16 Big-Endian
UTF16LE UTF-16 Little-Endian
UTF16BEM UTF-16 Big-Endian with BOM
UTF16LEM UTF-16 Little-Endian with BOM
SJIS Shift_JIS

JSON Format is supported only UTF-8.

Whatever the value of this option is, if the first character in a file is a UTF-8 byte order mark, the file will be loaded as UTF-8 encoding.

–format value, -f value
Format of query results. The default is TEXT, but CSV is used for output to pipe.
value(case ignored) format
CSV Character separated values. Separetor can be changed by -D option.
TSV Tab separated values
FIXED Fixed-Length Format
JSON JSON
JSONL JSON Lines
LTSV Labeled Tab-separated Values
GFM Text Table for GitHub Flavored Markdown
ORG Text Table for Emacs Org-mode
BOX Text Table using Box-drawing characters
TEXT Text Table for console
JSONH Alias of “–format JSON –json-escape HEX”
JSONA Alias of “–format JSON –json-escape HEXALL”

When this option is not specified, the file specified by the –out option will be output in a specific format if it has the following extensions.

file extension format
.csv CSV
.tsv TSV
.json JSON
.jsonl JSON Lines
.ltsv LTSV
.md GFM
.org ORG
–import-format value, -i value
Default format to load files. The default is CSV.
value(case ignored) format
CSV Character separated values. Separetor can be changed by -D option.
TSV Tab separated values
FIXED Fixed-Length Format
JSON JSON
JSONL JSON Lines
LTSV Labeled Tab-separated Values

Regardless of this option, files with the following extensions will be read in a specific format.

file extension format
.csv CSV
.tsv TSV
.json JSON
.jsonl JSON Lines
.ltsv Labeled Tab-separated Values
–json-escape, -J
JSON escape type. The default is BACKSLASH.
value(case ignored) description
BACKSLASH Escape special characters with backslashes(U+005C \)
HEX Escape special characters with six-character sequence
HEXALL Escape all strings with six-character sequence

Escaped characters in JSON

–json-query QUERY, -j QUERY
QUERY for JSON.
–limit-recursion
Maximum number of iterations for recursive queries. “-1” means no limit. The default is 1000.
–line-break value, -l value
Line break in query results. One of following values. The default is LF.
value(case ignored) unicode character
CRLF U+000D Carriage Return and U+000A Line Feed
CR U+000D Carriage Return
LF U+000A Line Feed
–no-header, -n
Import the first line as a record.

First line of a CSV file is dealt with as the header line. In case “–no-header” option passed, fields are automatically named as “c” and following sequential number. e.g. “c1”, “c2”, “c3”, …

–out FILE, -o FILE
Export result sets of select queries to FILE.

If the output file is not specified, the result sets are written to standard output.

–pretty-print, -P
Make JSON output easier to read in query results.
–quiet, -q
Suppress operation log output.
–repository PATH, -r PATH
Directory Path where files are located. The default is the current directory.
–scientific-notation -SN
Use Scientific Notation for large exponents in output.
–source FILE, -s FILE
Load query or statements from FILE.
–stats, -x
Show execution time and memory statistics.
Query Execution Time
execution time of one query. select, insert, update, or delete queries are measured.
TotalTime
total execution time
TotalAlloc
cumulative bytes of allocated heap objects
HeapSys
bytes of heap memory obtained from the OS
Mallocs
cumulative count of heap objects allocated
Frees
cumulative count of heap objects freed
–strict-equal, -g
Compare strictly that two values are equal for DISTINCT, GROUP BY and ORDER BY.
–strip-ending-line-break, -T
Strip line break from the end of files and query results.
–timezone value, -z value
Default Timezone. The default is Local.

Local, UTC or a timezone name in the IANA TimeZone database(in the form of “Area/Location”. e.g. “America/Los_Angeles”).

The timezone database is required in order to use the timezone names. Most Unix-like systems provide the database. But if your system does not provide it and you have not installed Go Lang, then you must put the database file named zoneinfo.zip to the directory “$ZONEINFO” or “$GOROOT/lib/time/”.

–wait-timeout value, -w value
Limit of the waiting time in seconds to wait for locked files to be released. The default is 10.
–without-header, -N
Export result sets of select queries without the header line.
–without-null, -a
Parse empty fields as empty strings.

In most cases CSV fields are imported as string values, but no-quoted empty fields are imported as nulls. By using the “–without-null” option, no-quoted empty fields are imported as empty string values.

–write-delimiter value, -D value
Field delimiter for query results in CSV format. The default is a comma(U+002C ,).
–write-delimiter-positions value, -M value
Delimiter positions for query results in Fixed-Length format. The default is “SPACES”.

If the field value is shorter than the length of the field, the missing part is padded with SPACE(U+0020).
For example, JSON Array “[5, 10, 15]” combines “123, abc, def” into “␣␣123abc␣␣def␣␣”.

–write-encoding value, -E value
Character encoding of query results. The default is UTF8.
value(case ignored) character encoding
UTF8 UTF-8
UTF8M UTF-8 with BOM
UTF16 An alias of UTF16BE
UTF16BE UTF-16 Big-Endian
UTF16LE UTF-16 Little-Endian
UTF16BEM UTF-16 Big-Endian with BOM
UTF16LEM UTF-16 Little-Endian with BOM
SJIS Shift_JIS
–help, -h
Show help
–version, -v
Print the version

If you want to pass “false” to a boolean command option, you can specify it as “–option-name=false”.
Some command options can also be specified in statements by using Set Flag Statements.

Determination of file format

Loading

First, the file extension to be loaded is checked. If the extension is displayed in the following table, the associated file format will be selected. Otherwise, the passed value by the “–import-format” option will be used to load.

extension file format
.csv CSV
.tsv TSV
.json JSON
.jsonl JSON Lines
.ltsv LTSV

The following options are available for loading.

You can also use Format Specified Functions to specify the format each file. A format specified function effects the first loading in a transaction. After the second loading, the specifications in the format specified function are ignored. You must use the ROLLBACK statement to discard all changes in the transaction if you reload the same file.

Updating

The table attributes that were determined when loading will be used to updating. You can modify table attributes by using SET ATTRIBUTE Statement).

Creating

First, the file extension to be loaded is checked. If the extension is displayed in the following table, the associated file format will be selected. Otherwise, CSV is used as the format. You can modify table attributes by using SET ATTRIBUTE Statement).

extension file format
.csv CSV
.tsv TSV
.json JSON
.jsonl JSON Lines
.ltsv LTSV
.md GitHub Flavored Markdown
.org Emacs Org-mode

Exporting query results with the “–out” option

The passed value by the “–format” option will be used to export.

The following options are available for exporting.

Subcommands

subcommand description
calc Calculate value from stdin
check-update Check for updates
fields Show fields in file
syntax Print syntax
help, h Shows help

Calc Subcommand

Calculate a value from stdin.

csvq [options] calc "expression"

In the expression, the value from stdin is represented as “c1”.

Example:

$ echo 'foo' | csvq calc "base64_encode(c1)"
Zm9v
$ echo 123 | csvq calc "c1 + 100"
223

Check Update Subcommand

Check for updates.

csvq [options] check-update [subcommand options]

Fields Subcommand

Show fields in a file.

csvq [options] fields CSV_FILE_PATH

Syntax Subcommand

Print syntax.

csvq [options] syntax [search_word ...]

Subcommand Options

–include-pre-release
Check including pre-release version.

Configurations

Before an execution of csvq, the following processings will be performed.

  1. Load Environment Configurations.
  2. Execute Pre-Load Statements.
  3. Overwrite Flags with Command Options.

Environment Configurations

Configuration Files are loaded in the following order.

  1. $XDG_CONFIG_HOME/csvq/csvq_env.json or $HOME/.config/csvq/csvq_env.json
  2. $HOME/.csvq_env.json
  3. $HOME/.csvq/csvq_env.json
  4. CURRENT_DIRECTORY/csvq_env.json

Download a JSON file from https://github.com/mithrandie/csvq/blob/master/csvq_env.json and put it to the above paths. If the files exists in the multiple paths, then all existing files will be loaded and configuration items are overwritten with the values in the file that is loaded later.

Configuration Items in JSON

Item Format default
datetime_format array of strings  
timezone string Local
ansi_quotes bool false
interactive_shell.history_file string .csvq_history
interactive_shell.history_limit number 500
interactive_shell.prompt string  
interactive_shell.continuous_prompt string  
interactive_shell.completion bool true
interactive_shell.kill_whole_line bool false
interactive_shell.vi_mode bool false
environment_variables object{var_name: string}  
palette.effectors object{effect_name: effect_object}  
Interactive Shell

Except for Prompt and Continuous Prompt, items in this category are effective only on the following systems.

History File

File name to save command history. If it is a relative path, then the path is interpreted as a relative path from your home directory. On UNIX-like systems, you can use environment variable such as $HOME or ${HOME} to specify the path. If not, ‘~’ can be used to represent home directory.

History Limit

Max length of command history. If history_limit is set to -1, then the command history is disabled.

Prompt, Continuous Prompt

Appearance of the prompt on the interactive shell. You can embed some expressions in the appearance by using a Dollar Sign(U+0024 $) and Curly Brackets(U+007B, U+007D {})

Completion

Whether to use completion.

Kill Whole Line

If true then keyboard shortcut “Ctrl+U” will remove the entire line. If not, the shortcut will remove before the current cursor position on the line.

Vi Mode

Whether to use vi-mode.

Effect Object
Effects

Effects are specified with an array of strings. Following strings are supported.

Colors

Foreground and background colors are specified with one of the following values.

8 Color (String)
Following strings are supported.
  • Black Red Green Yellow Blue Magenta Cyan White BrightBlack BrightRed BrightGreen BrightYellow BrightBlue BrightMagenta BrightCyan BrightWhite DefaultColor
256 Color (Number)
Color number from 0 to 255.
RGB Color (Array of 3 numbers)
The first element is the intensity of red between 0 and 255, the second is green, and the third is blue.

Pre-Load Statements

Files in which statements are written will be loaded and executed in the following order.

  1. $XDG_CONFIG_HOME/csvq/csvqrc or $HOME/.config/csvq/csvqrc
  2. $HOME/.csvqrc
  3. $HOME/.csvq/csvqrc
  4. CURRENT_DIRECTORY/csvqrc

Special Characters

In command parameters and statements, following strings represent special characters.

string unicode character
\a U+0007 Bell
\b U+0008 Backspace
\f U+000C Form Feed
\n U+000A Line Feed
\r U+000D Carriage Return
\t U+0009 Horizontal Tab
\v U+000b Vertical Tab
\” U+0022 Double Quote
\’ U+0027 Single Quote (in strings only)
\` U+0060 Grave Accent (in identifiers only)
\\ U+005c Backslash

Escaped Characters in JSON output

character BACKSLASH HEX HEXALL
U+0022 Quotation Mark \” \u0022 \u0022
U+005C Backslash \\ \u005C \u005C
U+002F Solidus \/ \u002F \u002F
U+0008 Backspace \b \u0008 \u0008
U+000C Form Feed \f \u000C \u000C
U+000A Line Feed \n \u000A \u000A
U+000D Carriage Return \r \u000D \u000D
U+0009 Horizontal Tab \t \u0009 \u0009
U+0000 - U+001F \uXXXX \uXXXX \uXXXX
- U+FFFF N/A N/A \uXXXX
U+10000 - N/A N/A \uXXXX\uXXXX (UTF-16 surrogate pair)

Return Code

code description
0 Normally terminated
1 Errors inside the csvq
2 Incorrect command usage
4 Syntax errors
8 Timeout errors
16 I/O errors
32 Errors outside the csvq
64 The default of triggered errors by TRIGGER ERROR statements
128+n Terminated by signal “n