Select Query
Select query is used to retrieve data from csv files.
select_query
: [with_clause]
select_entity
[order_by_clause]
[limit_clause]
[FOR UPDATE]
select_entity
: select_clause
[from_clause]
[where_clause]
[group_by_clause]
[having_clause]
| select_set_entity set_operator [ALL] select_set_entity
select_set_entity
: select_entity
| (select_query)
- with_clause
- With Clause
- select_clause
- Select Clause
- from_clause
- From Clause
- where_clause
- Where Clause
- group_by_clause
- Group By Clause
- having_clause
- Having Clause
- order_by_clause
- Order By Clause
- limit_clause
- Limit Clause
- set_operator
- Set Operators
With Clause
WITH common_table_expression [, common_table_expression ...]
- common_table_expression
- Common Table Expression
Select Clause
SELECT [DISTINCT] field [, field ...]
Distinct
You can use DISTINCT keyword to retrieve only unique records.
field syntax
field
: value
| value AS alias
| *
| table_name.*
- value
- value
- alias
- identifier
- table_name
- identifier
- *
- Asterisk(U+002A
*
) denotes all columns.When used alone, the asterisk selects all columns in all tables; when used with a table name, it selects all columns in that table.
From Clause
FROM table [, {table|LATERAL laterable_table} ...]
If multiple tables have been enumerated, tables are joined using cross join.
table syntax
table
: table_entity
| table_entity alias
| table_entity AS alias
| join
| DUAL
| laterable_table
| (table)
table_entity
: table_identifier
| format_specified_function
| inline_format_specified_function
table_identifier
: table_name
| url
| table_identification_function
| STDIN
laterable_table
: subquery
| subquery alias
| subquery AS alias
subquery
: (select_query)
join
: table CROSS JOIN table
| table [INNER] JOIN table join_condition
| table {LEFT|RIGHT|FULL} [OUTER] JOIN table join_condition
| table NATURAL [INNER] JOIN table
| table NATURAL {LEFT|RIGHT|FULL} [OUTER] JOIN table
| table CROSS JOIN LATERAL laterable_table
| table [INNER] JOIN LATERAL laterable_table join_condition
| table LEFT [OUTER] JOIN LATERAL laterable_table join_condition
| table NATURAL [INNER] JOIN LATERAL laterable_table
| table NATURAL LEFT [OUTER] JOIN LATERAL laterable_table
join_condition
: ON condition
| USING (column_name [, column_name, ...])
table_identification_function
: FILE::(file_path)
: INLINE::(file_path)
: URL::(url_string)
: DATA::(data_string)
format_specified_function
: CSV(delimiter, table_identifier [, encoding [, no_header [, without_null]]])
| FIXED(delimiter_positions, table_identifier [, encoding [, no_header [, without_null]]])
| JSON(json_query, table_identifier)
| JSONL(json_query, table_identifier)
| LTSV(table_identifier [, encoding [, without_null]])
inline_format_specified_function -- Deprecated. Table identification functions can be used instead.
: CSV_INLINE(delimiter, inline_table_identifier [, encoding [, no_header [, without_null]]])
| CSV_INLINE(delimiter, csv_data)
| JSON_INLINE(json_query, inline_table_identifier [, encoding [, no_header [, without_null]]])
| JSON_INLINE(json_query, json_data)
inline_table_identifier
: table_name
| url_identifier
- table_name
- identifier
A table_name represents a file path, a temporary table, or a inline table. You can use absolute path or relative path from the directory specified by the ”–repository” option as a file path.
When the file name extension is “.csv”, “.tsv”, “.json”, “.jsonl” or “.txt”, the format to be loaded is automatically determined by the file extension, and you can omit it.
FROM `user.csv` -- Relative path FROM `/path/to/user.csv` -- Absolute path FROM user -- Relative path without file extension
The specifications of the command options are used as file attributes such as encoding to be loaded. If you want to specify the different attributes for each file, you can use table_object expressions for each file to load.
Once a file is loaded, then the data is cached, and it can be loaded with only file name after that within the transaction.
- url
- A string of characters representing URL starting with a schema name and a colon.
“http”, “https” and “file” schemes are available.
https://example.com/files/data.csv -- Remote resource downloaded using HTTP GET method file:///C:/Users/yourname/files/data.csv -- Local file specified by absolute path file:./data.csv -- Local file specified by relative path
An inline table is created from remote resources. The downloaded data is cached until the transaction ends.
The file format is automatically determined when the http response specifies the following content types.
MIME type | Format |
---|---|
text/csv | CSV |
application/json | JSON |
- table_identification_function
- Function notation with a name followed by two colons.
-
FILE::(file_path)
file_path: string
This is the same as specifying a file using table_name.
-
INLINE::(file_path)
file_path: string
Files read by this function are not cached and cannot be updated.
-
URL::(url_string)
url_string: string
When specifying a resource using url, the path must be encoded, but this function does not require encoding.
-
DATA::(data_string)
file_path: string
This function creates an inline table from a string.
Example of use in a query:
SELECT id, tag_name, (SELECT COUNT(*) FROM JSON('', DATA::(assets))) AS number_of_assets, published_at FROM https://api.github.com/repos/mithrandie/csvq/releases WHERE prerelease = false ORDER BY published_at DESC LIMIT 10
-
- alias
- identifier
If alias is not specified, table_name stripped its directory path and extension is used as alias.
-- Following expressions are equivalent FROM `/path/to/user.csv` FROM `/path/to/user.csv` AS user
- select_query
- Select Query
- condition
- value
- column_name
- identifier
- delimiter
- string
- json_query
- JSON Query
Empty string is equivalent to “{}”.
- delimiter_positions
- string
“SPACES” or JSON Array of integers
- encoding
- string or identifier
“AUTO”, “UTF8”, “UTF8M”, “UTF16”, “UTF16BE”, “UTF16LE”, “UTF16BEM”, “UTF16LEM” or “SJIS”.
- no_header
- boolean
- without_null
- boolean
- url_identifier
- identifier
A URL of the http or https scheme to refer to a resource.
- csv_data
- string
- json_data
- string
Special Tables
- DUAL
- The dual table has one column and one record, and the only field is empty. This table is used to retrieve pseudo columns.
- STDIN
- The stdin table loads data from pipe or redirection as a csv data. The stdin table is one of temporary tables that is declared automatically. This table cannot to be used in the interactive shell.
Where Clause
The Where clause is used to filter records.
WHERE condition
- condition
- value
Group By Clause
The Group By clause is used to group records.
GROUP BY field [, field ...]
- field
- value
Having Clause
The Having clause is used to filter grouped records.
HAVING condition
- condition
- value
Order By Clause
The Order By clause is used to sort records.
ORDER BY order_item [, order_item ...]
order item
order_item
: field [order_direction] [null_position]
order_direction
: {ASC|DESC}
null_position
: NULLS {FIRST|LAST}
- field
- value
If DISTINCT keyword is specified in the select clause, you can use only enumerated fields in the select clause as field.
- order_direction
- ASC sorts records in ascending order. DESC sorts in descending order. ASC is the default.
- null_position
- FIRST puts null values first. LAST puts null values last. If order_direction is specified as ASC then FIRST is the default, otherwise LAST is the default.
Limit Clause
The Limit clause is used to specify the maximum number of records to return and exclude the first set of records.
limit_clause
: LIMIT number_of_records [{ROW|ROWS}] [{ONLY|WITH TIES}] [offset_clause]
| LIMIT percentage PERCENT [{ONLY|WITH TIES}] [offset_clause]
| [offset_clause] FETCH {FIRST|NEXT} number_of_records {ROW|ROWS} [{ONLY|WITH TIES}]
| [offset_clause] FETCH {FIRST|NEXT} percentage PERCENT [{ONLY|WITH TIES}]
| offset_clause
offset_clause
: OFFSET number_of_records [{ROW|ROWS}]
ROW and ROWS after number_of_records, FIRST and NEXT after FETCH, and ONLY keyword does not affect the result.
If PERCENT keyword is specified, maximum number of records is percentage percent of the result set that includes the excluded records by offset_clause.
If WITH TIES keywords are specified, all records that have the same sort keys specified by Order By Clause as the last record of the limited records are included in the records to return. If there is no Order By Clause in the query, WITH TIES keywords are ignored.