Subprogram overloading

Read the accounts of data type conversion and subprogram overloading in the PostgreSQL documentation.

The possibility of overloading brings the risk of overall application behavior changes when a new overload of an existing subprogram's name is defined. You can see a few practical examples in the section Choosing between overload candidates in the absence of a perfect match below.

The risk occurs because PostgreSQL, and therefore YSQL, support implicit data type conversion. The rules for data type conversion, and the consequences for how an overload candidate is selected, require careful explanation. The YSQL documentation doesn't reproduce the accounts. Rather, you should read the sections Type Conversion and Function Overloading in the PostgreSQL documentation.

Subprograms with different subprogram_call_signatures can share the same subprogram_name. If two or more subprograms share the same subprogram_name, then the subprogram_name is said to be overloaded. Notice the relationship between the subprogram_signature rule:

subprogram_signature ::= arg_decl [ , ... ]

arg_decl ::= [ arg_name ] [ arg_mode ] arg_type

arg_name ::= name

arg_mode ::= IN | OUT | INOUT | VARIADIC

arg_type ::= type_name

subprogram_signature

,arg_decl

arg_decl

arg_namearg_modearg_type

arg_name

name

arg_mode

INOUTINOUTVARIADIC

arg_type

type_name

and the subprogram_call_signature rule:

subprogram_call_signature ::= arg_type [ , ... ]

subprogram_call_signature

,arg_type

The subprogram_signature is a list of arg_decls; and an arg_decl has two optional components (arg_type and arg_mode) and one mandatory component (arg_type). But the only significant part of the subprogram_signature for distinguishing between overloads is the mandatory arg_type component.

'OUT' arguments are not included in the 'subprogram_call_signature'.

This rule is stated in the PostgreSQL documentation in the account of the pg_proc catalog table. See the description of the proargtypes column.

Yugabyte recommends that you never create a function with an out or inout argument but, rather, that you return all values of interest by specifying an appropriate composite data type for the returns clause. This implies preferring returns table(...) over returns setof. The latter requires a list of out arguments that correspond to the columns that you list, when you use the former, within the parenthesis of table(...).

See, too, this tip in the account of the create function statement.

As it happens, PostgreSQL Version 11.2 (on which YSQL is based) does not yet allow a procedure to have an out argument—so you must use an inout argument instead. See GitHub Issue #12348. YSQL, in some future version of YugabyteDB, will use a version of PostgreSQL where this limitation is removed. If you then decide to give a procedure a bare out argument, then you should remember that it will not be included in the procedure's call signature.

Negative examples

Notice that each example starts by dropping and re-creating the schema 's'.

It's easy to get confused when you try ad hoc tests, like those that this page shows, by the presence of previously-created overloads of the same subprogram_name . All the examples shown on this page, therefore, create their objects in the schema s1 and drop and recreate this at the start.

Attempt to create two functions that differ only by the names of their single argument.

drop schema if exists s1 cascade;
create schema s1;

create function s1.f(i in int)
  returns int
  language sql
as $body$
  select i*2;
$body$;

create function s1.f(j in int)
  returns int
  language sql
as $body$
  select j*2;
$body$;

The second create function attempt causes the 42723 error, function "f" already exists with same argument types. The names of the arguments, and their modes, are insignificant for distinguishing between overload candidates.

With the function s1.f(i in int) still in place, now try this:

create function s1.f(i in int)
  returns text
  language sql
as $body$
  select (i*2)::text;
$body$;

This, too, causes the 42723 error. These two versions of s1.f() have identical subprogram_call_signatures and differ only in the data types that they return. This difference, too, is insignificant for distinguishing between overload candidates.

With the function s1.f(i in int) still in place, now try this:

create procedure s1.f(i in int)
  language sql
as $body$
begin
  raise info 'procedure f(i in int)';
end;
$body$;

Once again, you get the 42723 error. The function and the procedure have the same subprogram_call_signature—but the kind of the subprogram, function or procedure is also insignificant for distinguishing between overload candidates.

The only significant information for distinguishing between overload candidates is the subprogram_call_signature.

Simple positive, but contrived, example

Try this:

drop schema if exists s1 cascade;
create schema s1;

create function s1.f(i in text)
  returns text
  language plpgsql
as $body$
begin
  return 'text overload';
end
$body$;

drop function if exists f(varchar) cascade;
create function s1.f(i in varchar)
  returns text
  language plpgsql
as $body$
begin
  return 'varchar overload';
end
$body$;

select
  (select s1.f('dog'::text))     as "typecast to 'text'",
  (select s1.f('dog'::varchar))  as "typecast to 'varchar'";

This is the result:

 typecast to 'text' | typecast to 'varchar'
--------------------+-----------------------
 text overload      | varchar overload

Now leave out the explicit typecast:

select s1.f('dog');

This is the result:

       f
---------------
 text overload

Now drop the text overload and repeat the identical query:

drop function s1.f(text);
select s1.f('dog');

This is the new result:

        f
------------------
 varchar overload

These two query results emphasize how treacherous the terrain is. The literal 'dog' is legal both as a text value and as a varchar value; and nothing about it indicates which is preferred. A committee decided, doubtless after careful thought, that:

  • The interpretation of the unadorned literal _'dog' as a text value is preferred over the interpretation as a varchar value—when both candidates exist.
  • The interpretation of the unadorned literal _'dog' as a varchar value allowed when no better candidate exists.

Notice that, now, even this attempt, where and explicit typecast to text is used, still resolves to the varchar candidate without error.

select s1.f('dog'::text);

Repeat this test but add a char overload into the mix. Experiment with various orders of dropping until you get down to just a single remaining overload. (This implies six different experiments.) The results might confuse you—and you'd probably find it hard to summarize them succinctly.

Most ordinary programmers find it impossible to memorise the actual rules that govern scenarios like these and to rely on them safely. The next section pursues this point with some more examples.

Don't mix the use of _text_ and _varchar_ in the same application.

Yugabyte recommends that you preemptively avoid the risk of overload confusion like the examples in this subsection show by making an up-front decision, at design specification time, between using only text or only varchar for string values. Use text when you have no interest in constraining the lengths of the strings; and use varchar when you do want to constrain specific kinds of strings to specific lengths.

(The char data type has its own peculiarities. It's best to avoid this data type altogether unless you can write down a very convincing reason in the design specification for why it's needed.)

Choosing between overload candidates in the absence of a perfect match

The following examples hammer home the point that the rules for choosing which overload to use when there is more than one plausible candidate, and which implicit typecasts are supported, are tortuous.

First create the basic setup with a text overload, an int overload, and a boolean overload of the name s1.f:

drop schema if exists s1 cascade;
create schema s1;

create function s1.f(i in text)
  returns text
  language plpgsql
as $body$
begin
  return 'text overload';
end
$body$;

drop function if exists s1.f(int) cascade;
create function s1.f(i in int)
  returns text
  language plpgsql
as $body$
begin
  return 'int overload';
end
$body$;

drop function if exists s1.f(boolean) cascade;
create function s1.f(i in boolean)
  returns text
  language plpgsql
as $body$
begin
  return 'boolean overload';
end
$body$;

select
  (select s1.f('dog'))  as "s1.f('dog')",
  (select s1.f(17))     as "s1.f(17)",
  (select s1.f(true))   as "s1.f(true))";

This is the result:

  s1.f('dog')  |   s1.f(17)   |   s1.f(true))
---------------+--------------+------------------
 text overload | int overload | boolean overload

Now do this:

drop function s1.f(text);

select
  (select s1.f('42'))  as "s1.f('42')",
  (select s1.f(17))    as "s1.f(17)";

The drop leaves behind the int and the boolean overloads. But, because the string 'true' can be typecast to boolean, the select causes this error:

42725: function s1.f(unknown) is not unique

Remove the source of confusion and try the query again:

drop function s1.f(boolean);

select
  (select s1.f('42'))  as "s1.f('42')",
  (select s1.f(17))    as "s1.f(17)";

Now the error goes away and you get this result:

  s1.f('42')  |   s1.f(17)
--------------+--------------
 int overload | int overload

Now remove the int overload and put back the boolean overload and try a different query:

drop function s1.f(int);
create function s1.f(i in boolean)
  returns text
  language plpgsql
as $body$
begin
  return 'boolean overload';
end
$body$;

select
  (select s1.f(true))    as "s1.f(true)",
  (select s1.f('true'))  as "s1.f('true')";

Again, there's no error; and you get this result:

    s1.f(true)    |   s1.f('true')
------------------+------------------
 boolean overload | boolean overload

Finally, put back the text overload and repeat the same query:

create function s1.f(i in text)
  returns text
  language plpgsql
as $body$
begin
  return 'text overload';
end
$body$;

select
  (select s1.f(true))    as "s1.f(true)",
  (select s1.f('true'))  as "s1.f('true')";

This time, the fact that 'true' can be implicitly converted to a boolean does not cause the "function s1.f(unknown) is not unique" error. Rather, this is the result:

    s1.f(true)    | s1.f('true')
------------------+---------------
 boolean overload | text overload

Consider distinguishing between intended overloads by using different names.

The tests shown on this page make the point that the rules for choosing which overload to use in the presence of more than one plausible candidate are tortuous. And in such scenarios, YSQL sometimes chooses a candidate without error and sometimes you get the "function s1.f(unknown) is not unique" error.

Consider the built-in functions that operate on (plain) json and on jsonb values. Several of them could have been implemented as overloads of the same name. But even so, the designers of this functionality decided to use a naming convention where what might have been an overload pair have json or jsonb in their names.

This approach is always open to you.