Select Query

Select query is used to retrieve data from csv files.

select_query
  : [with_clause]
      select_entity
      [order_by_clause]
      [limit_clause]
      [FOR UPDATE]

select_entity
  : select_clause
      [from_clause]
      [where_clause]
      [group_by_clause]
      [having_clause]
  | select_set_entity set_operator [ALL] select_set_entity 

select_set_entity
  : select_entity
  | (select_query)

with_clause: With Clause
select_clause: Select Clause
from_clause: From Clause
where_clause: Where Clause
group_by_clause: Group By Clause
having_clause: Having Clause
order_by_clause: Order By Clause
limit_clause: Limit Clause
set_operator: Set Operators

With Clause

WITH common_table_expression [, common_table_expression ...]

common_table_expression: Common Table Expression

Select Clause

SELECT [DISTINCT] field [, field ...]

Distinct

You can use DISTINCT keyword to retrieve only unique records.

field syntax

field
  : value
  | value AS alias
  | *
  | table_name.*

value: value
alias: identifier
table_name: identifier
*: Asterisk(U+002A *) denotes all columns.
When used alone, the asterisk selects all columns in all tables; when used with a table name, it selects all columns in that table.

From Clause

FROM table [, {table|LATERAL laterable_table} ...]

If multiple tables have been enumerated, tables are joined using cross join.

table syntax

table
  : table_entity
  | table_entity alias 
  | table_entity AS alias
  | join
  | DUAL
  | laterable_table
  | (table)

table_entity
  : table_identifier
  | format_specified_function
  | inline_format_specified_function

table_identifier
  : table_name
  | url
  | table_identification_function
  | STDIN
  
laterable_table
  : subquery
  | subquery alias
  | subquery AS alias

subquery
  : (select_query)

join
  : table CROSS JOIN table
  | table [INNER] JOIN table join_condition
  | table {LEFT|RIGHT|FULL} [OUTER] JOIN table join_condition
  | table NATURAL [INNER] JOIN table
  | table NATURAL {LEFT|RIGHT|FULL} [OUTER] JOIN table
  | table CROSS JOIN LATERAL laterable_table
  | table [INNER] JOIN LATERAL laterable_table join_condition
  | table LEFT [OUTER] JOIN LATERAL laterable_table join_condition
  | table NATURAL [INNER] JOIN LATERAL laterable_table
  | table NATURAL LEFT [OUTER] JOIN LATERAL laterable_table

join_condition
  : ON condition
  | USING (column_name [, column_name, ...])

table_identification_function
  : FILE::(file_path)
  : INLINE::(file_path)
  : URL::(url_string)
  : DATA::(data_string)

format_specified_function
  : CSV(delimiter, table_identifier [, encoding [, no_header [, without_null]]])
  | FIXED(delimiter_positions, table_identifier [, encoding [, no_header [, without_null]]])
  | JSON(json_query, table_identifier)
  | JSONL(json_query, table_identifier)
  | LTSV(table_identifier [, encoding [, without_null]])

inline_format_specified_function  -- Deprecated. Table identification functions can be used instead.
  : CSV_INLINE(delimiter, inline_table_identifier [, encoding [, no_header [, without_null]]])
  | CSV_INLINE(delimiter, csv_data)
  | JSON_INLINE(json_query, inline_table_identifier [, encoding [, no_header [, without_null]]])
  | JSON_INLINE(json_query, json_data)

inline_table_identifier
  : table_name
  | url_identifier

table_name

identifier

A table_name represents a file path, a temporary table, or a inline table. You can use absolute path or relative path from the directory specified by the ”–repository” option as a file path.

When the file name extension is “.csv”, “.tsv”, “.json”, “.jsonl” or “.txt”, the format to be loaded is automatically determined by the file extension, and you can omit it.

FROM `user.csv`          -- Relative path
FROM `/path/to/user.csv` -- Absolute path
FROM user                -- Relative path without file extension

The specifications of the command options are used as file attributes such as encoding to be loaded. If you want to specify the different attributes for each file, you can use table_object expressions for each file to load.

Once a file is loaded, then the data is cached, and it can be loaded with only file name after that within the transaction.

url

A string of characters representing URL starting with a schema name and a colon.

“http”, “https” and “file” schemes are available.

https://example.com/files/data.csv       -- Remote resource downloaded using HTTP GET method
file:///C:/Users/yourname/files/data.csv -- Local file specified by absolute path
file:./data.csv                          -- Local file specified by relative path

An inline table is created from remote resources. The downloaded data is cached until the transaction ends.

The file format is automatically determined when the http response specifies the following content types.

MIME type	Format
text/csv	CSV
application/json	JSON

table_identification_function

Function notation with a name followed by two colons.

FILE::(file_path)

file_path: string

This is the same as specifying a file using table_name.
INLINE::(file_path)

file_path: string

Files read by this function are not cached and cannot be updated.
URL::(url_string)

url_string: string

When specifying a resource using url, the path must be encoded, but this function does not require encoding.
DATA::(data_string)

file_path: string

This function creates an inline table from a string.

Example of use in a query:

SELECT id,
       tag_name,
       (SELECT COUNT(*) FROM JSON('', DATA::(assets))) AS number_of_assets,
       published_at
  FROM https://api.github.com/repos/mithrandie/csvq/releases
 WHERE prerelease = false
 ORDER BY published_at DESC
 LIMIT 10

alias

identifier

If alias is not specified, table_name stripped its directory path and extension is used as alias.

-- Following expressions are equivalent
FROM `/path/to/user.csv`
FROM `/path/to/user.csv` AS user

select_query

condition

column_name

delimiter

json_query

JSON Query

Empty string is equivalent to “{}”.

delimiter_positions

string

“SPACES” or JSON Array of integers

encoding

string or identifier

“AUTO”, “UTF8”, “UTF8M”, “UTF16”, “UTF16BE”, “UTF16LE”, “UTF16BEM”, “UTF16LEM” or “SJIS”.

no_header

boolean

without_null

boolean

url_identifier

identifier

A URL of the http or https scheme to refer to a resource.

csv_data

string

json_data

string

Special Tables

DUAL: The dual table has one column and one record, and the only field is empty. This table is used to retrieve pseudo columns.
STDIN: The stdin table loads data from pipe or redirection as a csv data. The stdin table is one of temporary tables that is declared automatically. This table cannot to be used in the interactive shell.

Where Clause

The Where clause is used to filter records.

WHERE condition

condition: value

Group By Clause

The Group By clause is used to group records.

GROUP BY field [, field ...] 

field: value

Having Clause

The Having clause is used to filter grouped records.

HAVING condition

condition: value

Order By Clause

The Order By clause is used to sort records.

ORDER BY order_item [, order_item ...]

order item

order_item
  : field [order_direction] [null_position]
  
order_direction
  : {ASC|DESC}
  
null_position
  : NULLS {FIRST|LAST}

field: value
If DISTINCT keyword is specified in the select clause, you can use only enumerated fields in the select clause as field.
order_direction: ASC sorts records in ascending order. DESC sorts in descending order. ASC is the default.
null_position: FIRST puts null values first. LAST puts null values last. If order_direction is specified as ASC then FIRST is the default, otherwise LAST is the default.

Limit Clause

The Limit clause is used to specify the maximum number of records to return and exclude the first set of records.

limit_clause
  : LIMIT number_of_records [{ROW|ROWS}] [{ONLY|WITH TIES}] [offset_clause]
  | LIMIT percentage PERCENT [{ONLY|WITH TIES}] [offset_clause]
  | [offset_clause] FETCH {FIRST|NEXT} number_of_records {ROW|ROWS} [{ONLY|WITH TIES}]
  | [offset_clause] FETCH {FIRST|NEXT} percentage PERCENT [{ONLY|WITH TIES}]
  | offset_clause

offset_clause
  : OFFSET number_of_records [{ROW|ROWS}]

number_of_records: integer
percent: float

ROW and ROWS after number_of_records, FIRST and NEXT after FETCH, and ONLY keyword does not affect the result.

If PERCENT keyword is specified, maximum number of records is percentage percent of the result set that includes the excluded records by offset_clause.

If WITH TIES keywords are specified, all records that have the same sort keys specified by Order By Clause as the last record of the limited records are included in the records to return. If there is no Order By Clause in the query, WITH TIES keywords are ignored.