SQL-like query language for csv

Select Query

Select query is used to retrieve data from csv files.

select_query
  : [with_clause]
      select_entity
      [order_by_clause]
      [limit_clause]
      [FOR UPDATE]

select_entity
  : select_clause
      [from_clause]
      [where_clause]
      [group_by_clause]
      [having_clause]
  | select_set_entity set_operator [ALL] select_set_entity 

select_set_entity
  : select_entity
  | (select_query)
with_clause
With Clause
select_clause
Select Clause
from_clause
From Clause
where_clause
Where Clause
group_by_clause
Group By Clause
having_clause
Having Clause
order_by_clause
Order By Clause
limit_clause
Limit Clause
set_operator
Set Operators

With Clause

WITH common_table_expression [, common_table_expression ...]
common_table_expression
Common Table Expression

Select Clause

SELECT [DISTINCT] field [, field ...]

Distinct

You can use DISTINCT keyword to retrieve only unique records.

field syntax

field
  : value
  | value AS alias
value
value
alias
identifier

From Clause

FROM table [, {table|LATERAL laterable_table} ...]

If multiple tables have been enumerated, tables are joined using cross join.

table syntax

table
  : table_entity
  | table_entity alias 
  | table_entity AS alias
  | join
  | DUAL
  | laterable_table
  | (table)

table_entity
  : table_identifier
  | table_object
  | json_inline_table

table_identifier
  : table_name
  | STDIN

laterable_table
  : subquery
  | subquery alias
  | subquery AS alias

subquery
  : (select_query)

join
  : table CROSS JOIN table
  | table [INNER] JOIN table join_condition
  | table {LEFT|RIGHT|FULL} [OUTER] JOIN table join_condition
  | table NATURAL [INNER] JOIN table
  | table NATURAL {LEFT|RIGHT|FULL} [OUTER] JOIN table
  | table CROSS JOIN LATERAL laterable_table
  | table [INNER] JOIN LATERAL laterable_table join_condition
  | table LEFT [OUTER] JOIN LATERAL laterable_table join_condition
  | table NATURAL [INNER] JOIN LATERAL laterable_table
  | table NATURAL LEFT [OUTER] JOIN LATERAL laterable_table

join_condition
  : ON condition
  | USING (column_name [, column_name, ...])

table_object
  : CSV(delimiter, table_identifier [, encoding [, no_header [, without_null]]])
  | FIXED(delimiter_positions, table_identifier [, encoding [, no_header [, without_null]]])
  | JSON(json_query, table_identifier)
  | LTSV(table_identifier [, encoding [, without_null]])

json_inline_table
  : JSON_TABLE(json_query, json_file)
  | JSON_TABLE(json_query, json_data)

table_name
identifier

A table_name represents a file path, a temporary table, or a inline table. You can use absolute path or relative path from the directory specified by the ”–repository” option as a file path.

When the file name extension is “.csv”, “.tsv”, “.json” or “.txt”, the format to be loaded is automatically determined by the file extension and you can omit it.

FROM `user.csv`          -- Relative path
FROM `/path/to/user.csv` -- Absolute path
FROM user                -- Relative path without file extension

The specifications of the command options are used as file attributes such as encoding to be loaded. If you want to specify the different attributes for each file, you can use table_object expressions for each file to load.

Once a file is loaded, then the data is cached and it can be loaded with only file name after that within the transaction.

alias
identifier

If alias is not specified, table_name stripped its directory path and extension is used as alias.

-- Following expressions are equivalent
FROM `/path/to/user.csv`
FROM `/path/to/user.csv` AS user
select_query
Select Query
condition
value
column_name
identifier
json_query
JSON Query

Empty string is equivalent to “{}”.

json_file
identifier

A json_file represents a json file path. You can use absolute path or relative path from the directory specified by the ”–repository” option as a json file path.

If a file name extension is “.json”, you can omit it.

json_data
string
delimiter
string
delimiter_positions
string

“SPACES” or JSON Array of integers

encoding
string or identifier

“AUTO”, “UTF8”, “UTF8M”, “UTF16”, “UTF16BE”, “UTF16LE”, “UTF16BEM”, “UTF16LEM” or “SJIS”.

no_header
boolean
without_null
boolean

A Table Object Expression for JSON loads data from JSON file, and you can operate the data. A JSON Table Expression can load data from JSON file as well, but the result is treated as a inline table, so you can only refer the result within the query.

Special Tables

DUAL
The dual table has one column and one record, and the only field is empty. This table is used to retrieve pseudo columns.
STDIN
The stdin table loads data from pipe or redirection as a csv data. The stdin table is one of temporary tables that is declared automatically. This table cannot to be used in the interactive shell.

Where Clause

The Where clause is used to filter records.

WHERE condition
condition
value

Group By Clause

The Group By clause is used to group records.

GROUP BY field [, field ...] 
field
value

Having Clause

The Having clause is used to filter grouped records.

HAVING condition
condition
value

Order By Clause

The Order By clause is used to sort records.

ORDER BY order_item [, order_item ...]

order item

order_item
  : field [order_direction] [null_position]
  
order_direction
  : {ASC|DESC}
  
null_position
  : NULLS {FIRST|LAST}
field
value

If DISTINCT keyword is specified in the select clause, you can use only enumerated fields in the select clause as field.

order_direction
ASC sorts records in ascending order. DESC sorts in descending order. ASC is the default.
null_position
FIRST puts null values first. LAST puts null values last. If order_direction is specified as ASC then FIRST is the default, otherwise LAST is the default.

Limit Clause

The Limit clause is used to specify the maximum number of records to return and exclude the first set of records.

limit_clause
  : LIMIT number_of_records [{ROW|ROWS}] [{ONLY|WITH TIES}] [offset_clause]
  | LIMIT percentage PERCENT [{ONLY|WITH TIES}] [offset_clause]
  | [offset_clause] FETCH {FIRST|NEXT} number_of_records {ROW|ROWS} [{ONLY|WITH TIES}]
  | [offset_clause] FETCH {FIRST|NEXT} percentage PERCENT [{ONLY|WITH TIES}]
  | offset_clause

offset_clause
  : OFFSET number_of_records [{ROW|ROWS}]
number_of_records
integer
percent
float

ROW and ROWS after number_of_records, FIRST and NEXT after FETCH, and ONLY keyword does not affect the result.

If PERCENT keyword is specified, maximum number of records is percentage percent of the result set that includes the excluded records by offset_clause.

If WITH TIES keywords are specified, all records that have the same sort keys specified by Order By Clause as the last record of the limited records are included in the records to return. If there is no Order By Clause in the query, WITH TIES keywords are ignored.