Query Data Introduction
You can query local and distributed file systems, Hive, HBase data, complex data, INFORMATION SCHEMA, and system tables as described in the subtopics of this section.
The query specifies the data source location and includes data casting.
Specifying the Data Source Location
The optional USE statement runs subsequent queries against a particular storage plugin. The USE statement typically saves typing some of the storage plugin information in the FROM statement. If you omit the USE statement, specify a storage plugin, such as dfs, and optionally a workspace, such as default, and a path to the data source using dot notation and back ticks. For example:
dfs.`default`.`/Users/drill-user/apache-drill-1.1.0/log/sqlline_queries.json`;
Casting Data
In some cases, Drill converts schema-less data to correctly-typed data implicitly. In this case, you do not need to cast the data to another type. The file format of the data and the nature of your query determines the requirement for casting or converting. Differences in casting depend on the data source.
For example, you have to cast a string "100" in a JSON file to an integer in order to apply a math function
or an aggregate function.
To query HBase data using Drill, convert every column of an HBase table to/from byte arrays from/to an SQL data type as described in the section "Querying HBase". Use CONVERT_TO or CONVERT_FROM functions to perform conversions of HBase data.
Troubleshooting Queries
In addition to testing queries interactively in the Drill shell, and examining error messages, use the EXPLAIN command to analyze errors and troubleshoot queries that do not run. For example, if you run into a casting error, the query plan text may help you isolate the problem.
0: jdbc:drill:zk=local> !set maxwidth 10000
0: jdbc:drill:zk=local> explain plan for select ... ;
Drill shell commands include the !set <set variable> <value> to increase the default text display (number of characters). By
default, most of the plan output is hidden.
Query Syntax Tips
Remember the following tips when querying data with Drill:
- Include a semicolon at the end of SQL statements, except when you issue a Drill shell command.
Example:!set maxwidth 10000` - Use backticks around keywords, special characters, and identifiers that SQL cannot parse, such as the keyword default and a path that contains a forward slash character:
Example:
SELECT * FROM dfs.`default`.`/Users/drilluser/apache-drill-1.1.0/sample-data/nation.parquet`; - When selecting all (SELECT *) schema-less data, the order of returned columns might differ from the stored order and might vary from query to query.
