Presto

Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. It has a connector architecture to query data from many data sources.

This document describes how to set up Presto to query YugabyteDB's YCQL tables.

1. Start local cluster

Follow the Quick start instructions to run a local YugabyteDB cluster. Test YugabyteDB's Cassandra-compatible API, as documented so that you can confirm that you have a Cassandra-compatible service running on localhost:9042. Ensure that you have created the keyspace and table, and inserted sample data as described there.

2. Download and configure Presto

Detailed steps are documented here. The following are the minimal setup steps for getting started:

$ wget https://repo1.maven.org/maven2/io/prestosql/presto-server/309/presto-server-309.tar.gz
$ tar xvf presto-server-309.tar.gz
$ cd presto-server-309

Create the “etc”, “etc/catalog”, and “data” directory inside the installation directory

$ mkdir etc
$ mkdir etc/catalog
$ mkdir data

Create node.properties file - replace <username> below

$ cat > etc/node.properties
node.environment=test
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/Users/<username>/presto-server-309/data

Press Ctrl-D after you have pasted the file contents.

Create jvm.config file

$ cat > etc/jvm.config
-server
-Xmx6G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

Press Ctrl-D after you have pasted the file contents.

Create config.properties file

$ cat > etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=4GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

Press Ctrl-D after you have pasted the file contents.

Create log.properties file

$ cat > etc/log.properties
io.prestosql=INFO

Press Ctrl-D after you have pasted the file contents.

Configure Cassandra connector to YugabyteDB

Create the Cassandra catalog properties file in etc/catalog directory. Detailed instructions are here.

$ cat > etc/catalog/cassandra.properties
connector.name=cassandra
cassandra.contact-points=127.0.0.1

Press Ctrl-D after you have pasted the file contents.

3. Download Presto CLI

$ cd ~/presto-server-309/bin
$ wget https://repo1.maven.org/maven2/io/prestosql/presto-cli/309/presto-cli-309-executable.jar

Rename the JAR file to presto. It is meant to be a self-running binary.

$ mv presto-cli-309-executable.jar presto && chmod +x presto

4. Launch Presto server

$ cd ~/presto-server-309

To run in foreground mode:

$ ./bin/launcher run

To run in background mode:

$ ./bin/launcher start

5. Test Presto queries

Use the presto CLI to run ad-hoc queries:

$ ./bin/presto --server localhost:8080 --catalog cassandra --schema default

Start using myapp:

presto:default> use myapp;
USE

Show the tables available:

presto:myapp> show tables;
 Table
-------
 stock_market
(1 row)

Describe a particular table:

presto:myapp> describe stock_market;
    Column     |  Type   | Extra | Comment
---------------+---------+-------+---------
 stock_symbol  | varchar |       |
 ts            | varchar |       |
 current_price | real    |       |
(3 rows)

Query with filter

presto:myapp> select * from stock_market where stock_symbol = 'AAPL';
 stock_symbol |         ts          | current_price
--------------+---------------------+---------------
 AAPL         | 2017-10-26 09:00:00 |        157.41
 AAPL         | 2017-10-26 10:00:00 |         157.0
(2 rows)

Query with aggregates

presto:myapp> select stock_symbol, avg(current_price) from stock_market group by stock_symbol;
 stock_symbol |  _col1
--------------+---------
 GOOG         | 972.235
 AAPL         | 157.205
 FB           | 170.365
(3 rows)