Re: Unfortunate pushing down of expressions below sort

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Unfortunate pushing down of expressions below sort
Date: 2026-04-07 20:16:54
Message-ID: 2351008.1775593014@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com> writes:
> Following up on the discussion below, I now have a patch.

> The patch extends make_sort_input_target() with a conservative rule:
> defer additional non-sort targetlist expressions past Sort only when
> doing so does not require carrying any additional Vars/PlaceHolderVars
> through Sort. This way, Sort input width never increases.

I spent some time thinking about this.

One thing I think we need to keep in mind is that if we don't postpone
an expression past Sort, and the user doesn't like that, she can
easily rewrite the query to force it; as indeed Andres demonstrated
at the start of this thread. But overriding an unwanted planner
decision to postpone is harder. I think you can do it with

SELECT * FROM (SELECT x,y,f(z) FROM ... OFFSET 0) ORDER BY whatever;

but if you forget the OFFSET-0 optimization fence you may find
f(z) getting evaluated after the sort anyway. And the fence might
foreclose some other optimization you did want.

Also, make_sort_input_target() has gone basically unchanged since
2016, without that many complaints. So I think we need to be pretty
conservative about adding postponement choices that aren't forced by
semantic requirements.

The rule stated above seems pretty conservative, but either it's not
conservative enough or you didn't implement it right, because the
regression test changes show the v2 patch is very willing to create
Result nodes where there were none before, even when there's no LIMIT
and thus no reason to think we can save any expression evaluations.
That extra plan node has nonzero cost that I don't think you're
accounting for. It'll still be a win if enough data volume is removed
from the Sort step, but I don't see any consideration of how much
we're actually saving before deciding to add the projection step.

So I think we need some sort of gating rule, whereby we only postpone
these expressions if (a) there was already a reason to add a
projection or (b) we can make some cost-based or at least heuristic
estimate that says we'll cut the sort data volume significantly.
Maybe (b) needs to interact with the existing heuristic about
postponing expensive expressions, not sure.

Independently of that, I don't especially like the changes in
make_sort_input_target(). They seem rather inelegant and expensive
(and underdocumented), as well as duplicative of other work already
being done in the function. It may be time to tackle the unfinished
work mentioned in the existing comments about avoiding redundant
cost/width calculations ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-04-07 20:24:04 Re: Adding REPACK [concurrently]
Previous Message Andres Freund 2026-04-07 20:09:25 Re: Better shared data structure management and resizable shared data structures