Undocumented Order By vs Target List Volatile Function Behavior

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Undocumented Order By vs Target List Volatile Function Behavior
Date: 2022-07-21 20:20:01
Message-ID: CAKFQuwZ3-XGfcS+CLTAYvPx3ARYjUxv+=YL8sOicV0nda=T5cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey,

This came up today on twitter as a claimed POLA violation:

postgres=# select random(), random() order by random();
random | random
---------------------+---------------------
0.08176638503720679 | 0.08176638503720679
(1 row)

Which was explained long ago by Tom as:

https://www.postgresql.org/message-id/9570.1193941378%40sss.pgh.pa.us

The parser makes it behave equivalent to:

SELECT random() AS foo ORDER BY foo;

Which apparently extends to any column, even aliased ones, that use the
same expression:

postgres=# select random() as foo, random() as foo2 order by foo;
foo | foo2
--------------------+--------------------
0.7334292196943459 | 0.7334292196943459
(1 row)

The documentation does say:

"A query using a volatile function will re-evaluate the function at every
row where its value is needed."

https://www.postgresql.org/docs/current/xfunc-volatility.html

That sentence is insufficient to explain why, without the order by, the
system chooses to evaluate random() twice, while with order by it does so
only once.

I propose extending the existing ORDER BY paragraph in the SELECT Command
Reference as follows:

"A limitation of this feature is that an ORDER BY clause applying to the
result of a UNION, INTERSECT, or EXCEPT clause can only specify an output
column name or number, not an expression."

Add:

A side-effect of this feature is that ORDER BY expressions containing
volatile functions will execute the volatile function only once for the
entire row; thus any column expressions using the same function will reuse
the same function result. By way of example, note the output differences
for the following two queries:

postgres=# select random() as foo, random()*1 as foo2 from
generate_series(1,2) order by foo;
foo | foo2
--------------------+--------------------
0.2631492904302788 | 0.2631492904302788
0.9019166692448664 | 0.9019166692448664
(2 rows)

postgres=# select random() as foo, random() as foo2 from
generate_series(1,2);
foo | foo2
--------------------+--------------------
0.7763978178239725 | 0.3569212477832773
0.7360531822096732 | 0.7028952103643864
(2 rows)

David J.

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-07-21 20:32:38 Re: System column support for partitioned tables using heap
Previous Message Jonathan S. Katz 2022-07-21 20:08:54 Re: Handle infinite recursion in logical replication setup