pgsql: When appropriate, postpone SELECT output expressions till after

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: When appropriate, postpone SELECT output expressions till after
Date: 2016-03-11 17:28:00
Message-ID: E1aeQr2-0004Ff-7z@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

When appropriate, postpone SELECT output expressions till after ORDER BY.

It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done. Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order. For example, in
SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step. Getting the desired
behavior reliably required contortions like
SELECT x, nextval('seq')
FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary. Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse. Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement. We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/9118d03a8cca3d97327c56bf89a72e328e454e63

Modified Files
--------------
doc/src/sgml/ref/select.sgml | 30 ++++
src/backend/optimizer/plan/planner.c | 324 ++++++++++++++++++++++++++++++-----
src/test/regress/expected/limit.out | 129 ++++++++++++++
src/test/regress/sql/limit.sql | 39 +++++
4 files changed, 478 insertions(+), 44 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2016-03-11 17:36:34 pgsql: pg_upgrade: Convert old visibility map format to new format.
Previous Message Teodor Sigaev 2016-03-11 17:12:08 pgsql: Fix Windows build broken in 6943a946c7e5eb72d53c0ce71f08a81a1335