| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Subject: | Unfortunate pushing down of expressions below sort |
| Date: | 2026-02-05 20:52:30 |
| Message-ID: | qddu2agmbiadpqkavqlamccrqxg6rm64sdq6q7ihpyndyfim6k@2dpox3fupa2g |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I was recently [1] reminded of something I've seen be problematic before:
We push down expressions below a sort node, even though they could be
evaluated above. That can very substantially increase the space needed for the
sort.
A simplified (and extreme-y-fied) example:
EXPLAIN (VERBOSE, ANALYZE) SELECT repeat(g.i::text, 1000) FROM generate_series(1, 10000) g(i) ORDER BY g.i;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort (cost=839.39..864.39 rows=10000 width=36) (actual time=65.905..66.552 rows=10000.00 loops=1) │
│ Output: (repeat((i)::text, 1000)), i │
│ Sort Key: g.i │
│ Sort Method: quicksort Memory: 38601kB │
│ -> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00 rows=10000 width=36) (actual time=0.896..48.459 rows=10000.00 loops=1) │
│ Output: repeat((i)::text, 1000), i │
│ Function Call: generate_series(1, 10000) │
│ Planning Time: 0.063 ms │
│ Execution Time: 69.253 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
I can manually rewrite that be executed better:
EXPLAIN (VERBOSE, ANALYZE) SELECT repeat(i::text, 1000) FROM (SELECT * FROM generate_series(1, 10000) g(i) ORDER BY g.i);
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Subquery Scan on unnamed_subquery (cost=764.39..864.39 rows=10000 width=32) (actual time=2.642..50.738 rows=10000.00 loops=1) │
│ Output: repeat((unnamed_subquery.i)::text, 1000) │
│ -> Sort (cost=764.39..789.39 rows=10000 width=4) (actual time=2.633..3.342 rows=10000.00 loops=1) │
│ Output: g.i │
│ Sort Key: g.i │
│ Sort Method: quicksort Memory: 385kB │
│ -> Function Scan on pg_catalog.generate_series g (cost=0.00..100.00 rows=10000 width=4) (actual time=0.999..1.690 rows=10000.00 loops=1) │
│ Output: g.i │
│ Function Call: generate_series(1, 10000) │
│ Planning Time: 0.063 ms │
│ Execution Time: 51.648 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)
Note that the runtime as well as the memory usage are reduced noticeably.
It's even worse when there is a LIMIT above the sort, because it leads to
evaluating the expression way more often than needed:
EXPLAIN (VERBOSE, ANALYZE) SELECT repeat(g.i::text, 1000) FROM generate_series(1, 10000) g(i) ORDER BY g.i LIMIT 10;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=391.10..391.12 rows=10 width=36) (actual time=50.910..50.912 rows=10.00 loops=1) │
│ Output: (repeat((i)::text, 1000)), i │
│ -> Sort (cost=391.10..416.10 rows=10000 width=36) (actual time=50.908..50.909 rows=10.00 loops=1) │
│ Output: (repeat((i)::text, 1000)), i │
│ Sort Key: g.i │
│ Sort Method: top-N heapsort Memory: 36kB │
│ -> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00 rows=10000 width=36) (actual time=0.869..47.820 rows=10000.00 loops=1) │
│ Output: repeat((i)::text, 1000), i │
│ Function Call: generate_series(1, 10000) │
│ Planning Time: 0.074 ms │
│ Execution Time: 50.938 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)
vs:
EXPLAIN (VERBOSE, ANALYZE) SELECT repeat(i::text, 1000) FROM (SELECT * FROM generate_series(1, 10000) g(i) ORDER BY g.i LIMIT 10);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Subquery Scan on unnamed_subquery (cost=316.10..316.20 rows=10 width=32) (actual time=3.098..3.149 rows=10.00 loops=1) │
│ Output: repeat((unnamed_subquery.i)::text, 1000) │
│ -> Limit (cost=316.10..316.12 rows=10 width=4) (actual time=3.086..3.090 rows=10.00 loops=1) │
│ Output: g.i │
│ -> Sort (cost=316.10..341.10 rows=10000 width=4) (actual time=3.083..3.085 rows=10.00 loops=1) │
│ Output: g.i │
│ Sort Key: g.i │
│ Sort Method: top-N heapsort Memory: 25kB │
│ -> Function Scan on pg_catalog.generate_series g (cost=0.00..100.00 rows=10000 width=4) (actual time=1.482..2.244 rows=10000.00 loops=1) │
│ Output: g.i │
│ Function Call: generate_series(1, 10000) │
│ Planning Time: 0.073 ms │
│ Execution Time: 3.185 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Now, a repeat(,1000) is obviously a silly example, but I think this is a real
issue. In the case in [1], deferring the evaluation of acldefault() till after
the sort reduces memory consumption by ~38%
Why are we evaluating the expression below the sort instead of above? I can
maybe see an argument for doing that if it's volatile, but it's not.
Interestingly we seem to do the sane thing for aggregation:
EXPLAIN (VERBOSE, ANALYZE) SELECT repeat(g.i::text, 1000) FROM generate_series(1, 10000) g(i) GROUP BY g.i;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=125.00..128.50 rows=200 width=36) (actual time=4.575..52.142 rows=10000.00 loops=1) │
│ Output: repeat((i)::text, 1000), i │
│ Group Key: g.i │
│ Batches: 1 Memory Usage: 553kB │
│ -> Function Scan on pg_catalog.generate_series g (cost=0.00..100.00 rows=10000 width=4) (actual time=0.897..1.518 rows=10000.00 loops=1) │
│ Output: i │
│ Function Call: generate_series(1, 10000) │
│ Planning Time: 0.042 ms │
│ Execution Time: 53.126 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
Note that the repeat() is computed above the aggregate. That also is true if
it's a sort based agg...
Greetings,
Andres Freund
[1] https://postgr.es/m/wgf63h3doepg2jnmofzbygrg7jujbjvxwkvoc7arej2zqcuf6c%403tzz22tizuew
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marcos Magueta | 2026-02-05 21:18:16 | Re: WIP - xmlvalidate implementation from TODO list |
| Previous Message | Sami Imseih | 2026-02-05 20:43:38 | Re: Fix pg_stat_get_backend_wait_event() for aux processes |