YSQL window functions

Window functions

If you are already familiar with window functions, then you can skip straight to the syntax and semantics section or the section that lists all of the YSQL window functions and that links, in turn, to the definitive account of each function.

This page has only the Synopsis section and the section Organization of the window functions documentation section.

Note: Some relational database systems use the term analytic function for what YSQL and PostgreSQL call a window function.

About the code examples

Each of the code examples uses one of the four tables "t1", "t2", "t3", or "t4". And none of them does any inserts, updates, or deletes. It will speed up your use of these examples it you install all of the tables in one sitting. Start, therefore, with the section The data sets used by the code examples. Then carry on with your study of the rest of this major section.

Synopsis

A window function operates, in general, on each of the set of windows into which the rows of its input row set are divided, and it produces one value for every row in each of the windows. A window is defined by having the same values for a set of one or more classifier columns. In the limit, there will be just a single window. In general, the output value of a window function for a particular row takes into account values from all of the rows in the successive windows on which it operates.

A window function can be invoked only in a select list item in a subquery; and specific syntax, using the OVER clause is essential. The argument of this clause is the window_definition . The authoritative account is given in the section Window function invocation—SQL syntax and semantics. Here is a brief overview. A window definition can have up to three clauses, thus:

  • The PARTITION BY clause. This specifies the rule that divides the input row set into two or more windows. If it's omitted, then the input row set is treated as a single window.
  • The window ORDER BY clause. This specifies the rule that determines how the rows within each window are ordered. If it's omitted, then the ordering is unpredictable—and therefore the output of the window function is meaningless.
  • The frame_clause. This is significant for only some of the eleven window functions. It determines which rows within each window are taken as the input for the window function. For example, you can use it to exclude the current row; or you can say that the function should look only at the rows in the window from the start through the current row.

Examples of each of these clauses are shown in the section Informal overview of function invocation using the OVER clause.

Window functions are similar to aggregate functions in this way:

  • Each operates on each of possibly many windows of a row set.

Note: As mentioned above, the row set for a window function can be defined only by using the PARTITION BY clause within a window_definition. The row set for an aggregate function may be defined in this way. But it may alternatively be defined by the regular GROUP BY clause at the syntax spot in a subquery that follows the WHERE clause's spot.

Window functions differ from aggregate functions in this way:

  • A window function produces, in general, a different output value for each different input row in the window.
  • An aggregate function, when it's invoked in the same way as a window function, often produces the same value for each different input row in the window. (The exact behavior depends on what the frame clause specifies.)
  • When an aggregate function is invoked using the regular GROUP BY clause, it produces a single value for each whole window that the GROUP BY clause defines.

Organization of the window functions documentation

The remaining pages are organized as follows:

Informal overview of function invocation using the OVER clause: here

Skip this section entirely if you are already familiar with window functions. It presents five code examples. This section focuses on the effect that each illustrated function has. It leaves their formal definitions to the invocation syntax and semantics section.

Window function invocation—SQL syntax and semantics: here

This section presents the formal treatment of the syntax and semantics of how a window function, or an aggregate function, is invoked as a special kind of SELECT list item in conjunction with the OVER keyword.

Signature and purpose of each window function: here

The following list groups the eleven window functions in the same way that the sidebar items group them. The rationale for the grouping is explained in the referenced sections.

      row_number()
      rank()
      dense_rank()

      percent_rank()
      cume_dist()
      ntile()

      first_value()
      nth_value()
      last_value()

      lag()
      lead()

      The data sets used by the code examples

Analyzing a normal distribution with percent_rank(), cume_dist() and ntile(): here

Regard this section as an optional extra. It answers an interesting question:

  • If you want to allocate a row set into buckets where each contains the same number of rows, based on your ordering rule, is there any difference between the result produced by using, in turn, each of the three functions percent_rank(), cume_dist(), or ntile()?

The answer is, of course, "Yes". But it's a qualified "Yes" because when certain conditions hold, there is no difference. The value that the study brings is due to the fact that it aims to meet a high-level goal rather than to demonstrate the basic use of low-level primitives. This means that it necessarily combines the basic use of the window functions under study with all sorts of other generic SQL techniques (and even stored procedure techniques) because these are needed to meet the goal. This kind of bigger-picture use typically goes hand-in-hand with any serious use of window functions.