The WITH clause and common table expressions (CTEs) [YSQL]

The WITH clause and common table expressions

A WITH clause can be used as part of a SELECT statement, an INSERT statement, an UPDATE statement, or a DELETE statement. For this reason, the functionality is described in this dedicated section.

Introduction

The WITH clause lets you name one or several so-called common table expressions. This latter term is a term of art, and doesn't reflect the spellings of SQL keywords. It is normally abbreviated to CTE, and this acronym will be used throughout the rest of the whole of this "WITH clause" section. See the section WITH clause—SQL syntax and semantics for the formal treatment.

Briefly, a CTE names a SQL [sub]statement which may be any of these kinds: SELECT, VALUES, INSERT, UPDATE, or DELETE. And the WITH clause is legal at the start of any of these kinds of statement : SELECT, INSERT, UPDATE, or DELETE. (VALUES is missing from the second list.) There are two kinds of CTE: the ordinary kind; and the recursive kind.

The statement text that the ordinary kind of CTE names has the same syntax as the corresponding SQL statement that you issue at top level. However, a recursive CTE may be used only in a WITH clause.

A CTE can, for example, be used to provide values for, say, an INSERT like this:

set client_min_messages = warning;
drop table if exists t1 cascade;
create table t1(k int primary key, v int not null);

with a(k, v) as (
  select g.v, g.v*2 from generate_series(11, 20) as g(v)
  )
insert into t1
select k, v from a;

select k, v from t1 order by k;

This component is an example of a CTE:

a(k, v) as (select g.v, g.v*2 from generate_series(11, 20) as g(v))

Notice the (optional) parenthesised parameter list that follows the name, just as in the definition of a schema-level VIEW.

This is the result:

 k  | v
----+----
 11 | 22
 12 | 24
 13 | 26
 14 | 28
 15 | 30
 16 | 32
 17 | 34
 18 | 36
 19 | 38
 20 | 40

When a data-changing [sub]statement (INSERT, UPDATE , or DELETE) is named in a CTE, and, when it uses a RETURNING clause, the returned values can be used in other CTEs and in the overall statement's final [sub]statement. Here is an example.

set client_min_messages = warning;
drop table if exists t2 cascade;
create table t2(k int primary key, v int not null);

with moved_rows(k, v) as (
  delete from t1
  where k > 15
  returning k, v)
insert into t2(k, v)
select k, v from moved_rows;

(
  select 't1' as table_name, k, v from t1
  union all
  select 't2' as table_name, k, v from t2
  )
order by table_name, k;

This is the result:

 table_name | k  | v
------------+----+----
 t1         | 11 | 22
 t1         | 12 | 24
 t1         | 13 | 26
 t1         | 14 | 28
 t1         | 15 | 30
 t2         | 16 | 32
 t2         | 17 | 34
 t2         | 18 | 36
 t2         | 19 | 38
 t2         | 20 | 40

The central notion is that each CTE that you name in a WITH clause can then be invoked by its name, either in a subsequent CTE in that WITH clause or in the overall statement's final, main, [sub]statement. In this way, a CTE is analogous, in the declarative programming domain of SQL, to a procedure or a function in the domain of an "if-then-else" programming language, bringing the modular programming benefit of hiding names, and the implementations that they stand for, from scopes that have no interest in them.

Finally, the use of a recursive CTE in a WITH clause enables advanced functionality, like graph analysis. For example, an "employees" table often has a self-referential foreign key like "manager_id" that points to the table's primary key, "employee_id". SELECT statements that use a recursive CTE allow the reporting structure to be presented in various ways. This result shows the reporting paths of employees, in an organization with a strict hierarchical reporting scheme, in depth-first order. See the section Pretty-printing the top-down depth-first report of paths.

 emps hierarchy
----------------
 mary
   fred
     alfie
     dick
   george
   john
     alice
     bill
       joan
     edgar
   susan

The remainder of this section has the following subsections:

Performance considerations

A SQL statement that uses a WITH clause sometimes gets a worse execution plan than the semantically equivalent statement that doesn’t use a WITH clause. The explanation is usually that a “push-down” optimization of a restriction or projection hasn’t penetrated into the WITH clause’s CTE. You can usually avoid this problem by manually pushing down what you’d hope would be done automatically into your spellings of the WITH clause’s CTEs.

Anyway, the ordinary good practice principle holds even more here: always check the execution plans of the SQL statements that your application issues, on representative samples of data, before moving the code to the production environment.

Downloadable WITH clause demonstration scripts

The recursive-cte-code-examples.zip file contains the .sql scripts that illustrate the use of the recursive CTE:

All of these studies make heavy use of regular (non-recursive) CTEs. They therefore show the power of the CTE in a natural, rather than a contrived, way.

After unzipping it in a convenient new directory, you'll see a README.txt. It tells you how to start, in turn, a few master-scripts. Simply start each in ysqlsh. You can run these time and again. Each one always finishes silently. You can see the reports that they produce on the dedicated spool directories and confirm that the files that are spooled are identical to the corresponding reference copies that are delivered in the zip-file.