View COPY status with pg_stat_progress_copy

YugabyteDB supports the PostgreSQL pg_stat_progress_copy view to report the progress of the COPY command execution. Whenever COPY is running, the pg_stat_progress_copy view contains one row for each client connection that is currently running a COPY command.

The following table describes the view columns:

Column Description
pid Process ID of backend.
datid Object ID of the database to which this backend is connected.
datname Name of the database to which this backend is connected.
relid Object ID of the table on which the COPY command is executed. It is set to 0 if copying from a SELECT query.
command The command that is running COPY FROM, or COPY TO.
type The I/O type that the data is read from or written to: FILE, PROGRAM, PIPE (for COPY FROM STDIN and COPY TO STDOUT), or CALLBACK (used for example during the initial table synchronization in logical replication).
yb_status Tracking status of the copy command.
bytes_processed Number of bytes already processed by the COPY command.
bytes_total Size of the source file for the COPY FROM command in bytes. Set to 0 if not available.
tuples_processed Number of tuples already processed by the COPY command.
tuples_excluded Number of tuples not processed because they were excluded by the WHERE clause of the COPY command.

YugabyteDB-specific changes

The pg_stat_progress_copy view includes the following YugabyteDB-specific changes:

Definition of tuples_processed

The definition of tuples_processed column is different in YugabyteDB in comparison to PostgreSQL.

In YugabyteDB, the ROWS_PER_TRANSACTION option is added to the COPY command, defining the transaction size to be used. For example, if the total tuples to be copied is 5000 and ROWS_PER_TRANSACTION is set to 1000, the database creates 5 transactions and each transaction inserts 1000 rows. If there is an error during execution, then some tuples can be persisted based on the already completed transaction.

Because each COPY is divided into multiple transactions, tuples_processed tracks the rows that the transaction has already completed.

For more information, refer to ROWS_PER_TRANSACTION.

New column yb_status

In YugabyteDB, the pg_stat_progress_copy view includes the column yb_status to indicate the status of the COPY command.

If a COPY command terminates due to any error, then it's possible that some tuples are persisted as explained in the tuples_processed section. In this case, tuples_processed shows a non-zero count and yb_status shows that the COPY command was terminated due to an error. This is helpful for discovering whether the copy completed or not. This is unnecessary in PostgreSQL, where copy is a single transaction.

yb_status can have the following values:

  • IN PROGRESS – Indicating that the COPY command is still running.
  • ERROR – COPY command was terminated due to an error.
  • SUCCESS – COPY command successfully completed.

COPY command information

The pg_stat_progress_copy view retains copy command information after the copy operation has completed.

In PostgreSQL, when the COPY command finishes, the row containing details of the copy command is removed from the view. In YugabyteDB, the information is retained in the view after the copy has finished. The information is retained only for the last executed copy command on that connection.

This is required for YugabyteDB, because if the COPY command finishes due to an error, then you would like to know how many rows were reliably persisted to the disk.

Examples

The following examples demonstrate the possible stages (IN PROGRESS, ERROR, SUCCESS) for the copy operation.

Prerequisites

Before you start

The examples will run on a single-node local YugabyteDB universe.
To create a universe, see Set up YugabyteDB universe.

  • Run the following script from your YugabyteDB installation directory to generate data to a file test_data.csv.

    #!/bin/bash
    rm -rf test_data.csv
    for i in {1..100000}
    do
       echo "$i, $i, $i" >> test_data.csv
    done
    
  • From your local YugabyteDB installation directory, connect to the YSQL shell, and create a table using the following schema:

    create table test_copy ( h1 int, r1 int, v1 int, primary key (h1, r1));
    

Run a successful COPY command

  1. Copy test_data.csv using the following copy command:

    \copy test_copy from 'test_data.csv'  (DELIMITER ',');
    
    COPY 100000
    
  2. Verify yb_status using the view.

    select * from pg_stat_progress_copy ;
    

    You should see output similar to the following:

     pid  | datid | datname  | relid |  command  | type | yb_status | bytes_processed | bytes_total | tuples_processed | tuples_excluded
     -----+-------+----------+-------+-----------+------+-----------+-----------------+-------------+------------------+-----------------
    74390 | 13288 | yugabyte | 16394 | COPY FROM | PIPE | SUCCESS   |         1966685 |           0 |           100000 |               0
    (1 row)
    

Verify a COPY command status in progress

Copy test_data.csv using the following copy command and check the status in parallel from another terminal.

\copy test_copy from 'test_data.csv'  (DELIMITER ',');
select * from pg_stat_progress_copy ;

You should see output similar to the following:

 pid  | datid | datname  | relid |  command  | type |  yb_status  | bytes_processed | bytes_total | tuples_processed | tuples_excluded
------+-------+----------+-------+-----------+------+-------------+-----------------+-------------+------------------+-----------------
74390 | 13288 | yugabyte | 16404 | COPY FROM | PIPE | IN PROGRESS |          766682 |           0 |            40000 |               0
(1 row)

Verify an interrupted COPY operation

  1. Copy test_data.csv using the following command, and then interrupt it using Ctrl+C:

    \copy test_copy from 'test_data.csv'  (DELIMITER ',');
    
    ^CCancel request sent
    ERROR:  canceling statement due to user request
    CONTEXT:  COPY test_copy, line 55184: "55184, 55184, 55184"
    
  2. Verify yb_status using the view.

    select * from pg_stat_progress_copy ;
    
      pid  | datid | datname  | relid |  command  | type | yb_status | bytes_processed | bytes_total | tuples_processed | tuples_excluded
    -------+-------+----------+-------+-----------+------+-----------+-----------------+-------------+------------------+-----------------
     74390 | 13288 | yugabyte | 16399 | COPY FROM | PIPE | ERROR     |          766682 |           0 |            40000 |               0
    (1 row)
    

    The default for ROWS_PER_TRANSACTION is 20K and the interruption occurred on row 55184 in the example. As a result, the first two transactions of 20K rows were persisted resulting in 40K rows reported by tuples_processed. The next in-progress transaction was interrupted, and for this reason the remaining 15184 rows were not persisted.

Learn more