Tune queries in YugabyteDB

A YugabyteDB cluster consists of multiple nodes running the YB-TServer server process. Due to its distributed nature, tuning a YugabyteDB database requires keeping in mind that, unlike most common databases where only a local machine influences a database query, data is stored across all the nodes of the cluster. This section provides an introduction to tuning YugabyteDB and the tools available.

Before you begin

Before trying to optimize individual statements, make sure the YugabyteDB cluster is running optimally:

  • Verify that all nodes that are supposed to be running are indeed running.
  • Verify that the YB-TServer and, where applicable, YB-Master processes are running on the nodes.

To view the nodes and servers that make up your cluster, use the yb-admin command to request the master and tablet servers. For example:

$ ./bin/yb-admin -init_master_addrs=$(hostname):7100 list_all_masters
Master UUID                        RPC Host/Port           State       Role
a637b88dfc0c4476862ca79872d763d7   172.158.22.229:7100     ALIVE       LEADER
968eb4a7a2e24c3ab95929d3a888ed05   172.158.39.23:7100      ALIVE       FOLLOWER
44e13e096955412a88c6ffd65a1e241c   172.158.50.212:7100     ALIVE       FOLLOWER
$ ./bin/yb-admin -init_master_addrs=$(hostname):7100 list_all_tablet_servers
Tablet Server UUID               RPC Host/Port       Heartbeat delay Status   Reads/s
d80150e0eeda4477a231968440dd89a9 172.158.50.212:9100 0.20s           ALIVE    ..
721661fd5d2044179e03707a862aa578 172.158.22.229:9100 0.39s           ALIVE    ..
54e17c914c0f4679aa7b07a9c7a8ddf5 172.158.39.23:9100  0.48s           ALIVE    ..

Next, make sure enough disk space is available, elementary components such as CPU, disk, and the network do not report errors, and the operating system does not report any malfunction.

For more information on troubleshooting cluster- and node-level issues, refer to Troubleshoot YugabyteDB.

After you know the entire cluster is running correctly, you can move to statement tuning.

Tuning in YSQL

To investigate a specific apparently slow running query, you first need to determine which node the query is running on. To identify the node on which the query is running, log on to the nodes and use the pg_stat_activity view to see running queries.

Note

Remember that even for a single query running on one node, the whole cluster can show activity, because the storage layer is distributed over all nodes.

After you identify the correct node, YSQL provides two views you can use to identify SQL statements and their performance characteristics:

  • pg_stat_activity provides an overview, including SQL, of current activity. Apart from the query text, it does not provide information about the SQL executed; rather it shows the current state of the PostgreSQL backend.

  • pg_stat_statements provides historic information about executed SQL statements. Currently, no 'block'/'blk' information is recorded (which is logical and physical I/O information), because the PostgreSQL I/O codepath is not used.

Which view you use for performance tuning depends on the performance issue and whether you can witness the performance problem happening live, or if it was a performance problem in the past.

Note that identical queries might not perform identically because of differences in data.

After you identify a problematic SQL statement, the next step is to look at the execution plan using EXPLAIN. An execution plan is the order and method of execution to perform the work requested in the SQL statement. This plan is generated by the database planner.

YugabyteDB uses PostgreSQL’s cost-based optimizer, which estimates the costs of each possible execution plan for an SQL statement. The planner calculates the cheapest plan to execute a query, assuming that is also the fastest and the best plan, based on statistics derived from the database table and supporting structures like secondary indexes. The execution plan with the lowest cost finally is executed.

Note

Currently, YugabyteDB does not perform size-specific costing, and assumes every plan generates a thousand rows, which translates to a cost of a thousand.

You can use EXPLAIN in two modes:

  • The default mode shows the plan that the planner created and the predicted cost.

  • EXPLAIN ANALYZE mode executes the query, measures the time and actual work done, and adds these statistics to the execution plan as 'actual'.

Both the predicted and actual statistics have their uses. The predicted statistics are what the planner uses, and assuming the statistics don't change, the planner will come up with the same execution plan the next time the query is run, even if it caused performance issues in the past. The actual statistics show how much and what work was actually performed, and thus are beneficial to see the effectiveness of the execution plan.

Tip

Complex execution plans can be difficult to read or understand. Use online tools to calculate and visualize execution plan timelines. For example, https://explain.dalibo.com.

Performance tuning tools in YugabyteDB

YugabyteDB provides a number of familiar tools you can use for tuning queries.

Find slow queries

Use the pg_stat_statements extension to get statistics on past queries. Using pg_stat_statements, you can investigate queries by userid and dbid, calls, rows, and min, max, mean, standard deviation and total time.

The pg_stat_statements extension module is installed by default, but must be enabled for a database before you can query the pg_stat_statements view.

CREATE EXTENSION if not exists pg_stat_statements;

To get the output of pg_stat_statements in JSON format, visit https://<yb-tserver-ip>:13000/statements in your web browser, where <yb-tserver-ip> is the IP address of any YB-TServer node in your cluster.

For more information, refer to Get query statistics using pg_stat_statements.

View live queries

Use the pg_stat_activity view to get information on currently running tasks. Using pg_stat_activity you can identify inactive, active, and long time active sessions, and get process information and the current query.

To get the output of pg_stat_activity in JSON format, visit https://<yb-tserver-ip>:13000/rpcz in your web browser, where <yb-tserver-ip> is the IP address of any YB-TServer node in your cluster.

For more information, refer to View live queries with pg_stat_activity.

View COPY operation status

Use the pg_stat_progress_copy view to get status information of a COPY command execution. In addition to the COPY status, pg_stat_progress_copy provides the number of tuples processed and other additional information, and retains the COPY progress report after the command execution.

For more information, refer to View COPY progress with pg_stat_progress_copy.

View plans with EXPLAIN

Like PostgreSQL, YugabyteDB provides the EXPLAIN statement to show the query execution plan generated by YSQL for a given SQL statement. Using EXPLAIN, you can discover where in the query plan the query is spending most of its time, and using this information, decide on the best approach for improving query performance. This could include strategies such as adding an index or changing index sort order.

For more information, refer to Analyze queries with EXPLAIN.

Advanced tools

Use the following tools to log slow-running queries and optimize queries using hint plans.

Log slow queries

You can set the --ysql_log_min_duration_statement flag to help track down slow queries. When configured, YugabyteDB logs the duration of each completed SQL statement that runs the specified duration (in milliseconds) or longer. (Setting the value to 0 prints all statement durations.)

$ ./bin/yb-tserver --ysql_log_min_duration_statement 1000

Example log output:

2021-05-26 21:13:49.701 EDT [73005] LOG:  duration: 34.378 ms  statement: SELECT c.oid,
        n.nspname,
        c.relname
    FROM pg_catalog.pg_class c
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relname OPERATOR(pg_catalog.~) '^(products)$'
        AND pg_catalog.pg_table_is_visible(c.oid)
    ORDER BY 2, 3;

Results are written to the current postgres*log file. For information on the YB-TServer logs, refer to YB-TServer logs.

Note

Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized.

Ensure that the threshold is high enough so that you don't flood the postgres*log log files.

For more information on flags for configuring the YB-TServer server, refer to YSQL Flags.

Use a hint plan

YugabyteDB uses the PostgreSQL pg_hint_plan extension to control query execution plans with hints.

pg_hint_plan makes it possible to influence the query planner using so-called "hints", which are C-style comments that use a special syntax.

Note

To use pg_hint_plan effectively, you need a thorough knowledge of how your application will be deployed. Hint plans also need to be revisited when the database grows or the deployment changes to ensure that the plan is not limiting performance rather than optimizing it.

For more information, refer to Optimizing YSQL queries using pg_hint_plan.