| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com> |
| Subject: | Re: Unfortunate pushing down of expressions below sort |
| Date: | 2026-04-07 21:01:43 |
| Message-ID: | nggrz464c7hqsy45f7pymfn7mk7qaupcjc6jmsdcgwcshakabq@lwh7d4igd77o |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-04-07 16:16:54 -0400, Tom Lane wrote:
> That extra plan node has nonzero cost that I don't think you're
> accounting for. It'll still be a win if enough data volume is removed
> from the Sort step, but I don't see any consideration of how much
> we're actually saving before deciding to add the projection step.
A different way (compared to the heuristics you were subsequently talking
about) to deal with that could be to add projection support to sort's output,
I guess. That would have a considerably smaller cost than a separate node.
It'd add a tiny bit of cost (an if checking for projection) for the rather
common of a sort without a projection. Although that'd not be too hard to get
rid of by generating specialized ExecSort() variants for the different cases,
like now done for ExecSeqScan, if it turned out to matter.
However it'd probably would still not be guaranteed faster than evaluating
below the sort, due to
a) startup cost of the projection machinery
b) potentially needing to deform during the expression's inputs during the
projection in the sort (or above)
I don't know if there are realistic cases where b) matters? You'd have to
have nodes above the sort that would require the projection to happen at the
sort (or an intermediary level) but then filter out most rows to avoid needing
to deform anyway? In all the realistic cases I can think of the expression
evaluation should then be pulled up above that filtering out of rows?
I don't know if a) is really ever significant enough to matter compared to the
cost of sorting. It sure shows up in a sequential scan, but that has an order
of magnitude or three lower per tuple cost.
Are there cases where something like Chengpeng's logic would still trigger a
result node being injected, if sort had projection capability?
> So I think we need some sort of gating rule, whereby we only postpone
> these expressions if (a) there was already a reason to add a
> projection or (b) we can make some cost-based or at least heuristic
> estimate that says we'll cut the sort data volume significantly.
This reminds me: The heuristics around the cost of expression evaluation seem
like they could be improved a fair bit by taking into account the cost of
having to deform the input columns. There's a huge difference between
func1(col1), func2(col1)
and
func1(col1), func2(col2)
and
func1(col30)
Unless func* are particularly expensive, the cost will be increasingly
dominated by tuple deforming.
I think this is one thing that sometimes contributes to us wrongly choosing
sequential scans with a qual over an index scans that needs to evaluate far
fewer rows, because we just take the operator costs into account, not the
tuple deforming it requires.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-04-07 21:18:53 | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Previous Message | Matheus Alcantara | 2026-04-07 21:00:56 | Re: postgres_fdw: Use COPY to speed up batch inserts |