From: | Lauro Ojeda <lauro(dot)ojeda(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Partition pruning is not happening (even in PG18) |
Date: | 2025-09-29 13:49:31 |
Message-ID: | CAG3nGZ-PpRarYRqbMzLZoVkma+ifSmNjmuPSL0tT3_opMZ1Dxg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi David,
Thank you for your nice reply.
I have the impression there is something heavily penalizing the usage of
partition pruning. While trying to go a bit further, I realized that
partition pruning is not occurring because the planner gives the cost of
seq-scanning all partitions to just over 2M units, but the query takes over
3 minutes. When forcing the usage of PK index via hinting (which down the
line forces the partition pruning to happen), the cost sky-rocket to 53M
units (however scanning 4 partitions out of 12). Even though with much
higher cost, the query launches less workers and completes in a third (or
less) of the time of all partition scans.
My observation is that the cost of seq-scanning each partition is estimated
in ~124k units, but index-scanning each partition is 440k (say, 3x higher).
Therefore, scanning 4 partitions, the total cost should be ~1.7M units,
which is below the 2M units from the cost of all partition scanning, thus
the planner should prefer the indexed path instead. However when analyzing
the hinted query (doing index scan and thus partition pruning) we can note
that the cost estimated to each partition scan was added to the final cost,
even though 8 out of 12 partitions were not scanned (never executed).
postgres=> explain analyze
/*+ IndexScan(accounts accounts_pkey) */
select aid, abalance
from accounts
where transaction_date in (select dt_col from t2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.56..53359847.63 rows=2767123 width=8) (actual
time=2.299..111346.649 rows=1664742 loops=1)
-> HashAggregate (cost=1.12..1.23 rows=10 width=4) (actual
time=1.060..40.562 rows=6 loops=1)
Group Key: t2.dt_col
Batches: 1 Memory Usage: 24kB
-> Seq Scan on t2 (cost=0.00..1.10 rows=10 width=4) (actual
time=1.048..1.050 rows=6 loops=1)
-> Append (cost=0.43..5302777.61 rows=3320703 width=12) (actual
time=1.554..18514.181 rows=277457 loops=6)
-> Index Scan using accounts_p1_pkey on accounts_p1 accounts_1
(cost=0.43..434635.82 rows=273011 width=12) (actual time=2.214..17627.113
rows=276971 loops=1)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p2_pkey on accounts_p2 accounts_2
(cost=0.43..441097.44 rows=277283 width=12) (actual time=2.185..16280.020
rows=276634 loops=1)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p3_pkey on accounts_p3 accounts_3
(cost=0.43..441666.94 rows=277392 width=12) (actual time=0.751..25992.220
rows=278390 loops=1)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p4_pkey on accounts_p4 accounts_4
(cost=0.43..441689.70 rows=277483 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p5_pkey on accounts_p5 accounts_5
(cost=0.43..441883.26 rows=277546 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p6_pkey on accounts_p6 accounts_6
(cost=0.43..*441857.26 *rows=277591 width=12) (actual
time=1.376..17001.777 rows=277582 loops=3)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p7_pkey on accounts_p7 accounts_7
(cost=0.43..441837.57 rows=277503 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p8_pkey on accounts_p8 accounts_8
(cost=0.43..441843.95 rows=277511 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p9_pkey on accounts_p9 accounts_9
(cost=0.43..441711.03 rows=277506 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p10_pkey on accounts_p10 accounts_10
(cost=0.43..441918.96 rows=277554 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p11_pkey on accounts_p11 accounts_11
(cost=0.43..441501.86 rows=277377 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
-> Index Scan using accounts_p12_pkey on accounts_p12 accounts_12
(cost=0.43..434530.31 rows=272946 width=12) (never executed)
Index Cond: (transaction_date = t2.dt_col)
Planning Time: 0.879 ms
Execution Time: 111432.755 ms
(32 rows)
By looking into it, I have the impression there is a bug in the costing sum
in that situation, where the cost of the "never executed" partitions should
be deducted from the final cost estimation, which would make pruning to be
the preferred option in this case.
Are my assumptions correct?
Is there anything I could do to influence the planner to dismiss the cost
of "never executed" scans?
Kind regards,
Lauro Ojeda
On Thu, 25 Sept 2025 at 22:18, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Fri, 26 Sept 2025 at 07:49, Lauro Ojeda <lauro(dot)ojeda(at)gmail(dot)com> wrote:
> > The only way I found to make pruning work is to force index_scan using
> pg_hint_plan, but I wanted to influence the planner to decide it by itself
> rather than relying on hints. What's the reason for this misbehaving and
> what could I do to overcome it?
>
> > Partition key: RANGE (transaction_date)
>
> > postgres=> explain analyze
> > select aid, abalance
> > from accounts
> > where transaction_date in (select dt_col from t2);
>
> The only partition pruning that exists in PostgreSQL that can prune
> for that query is for parameterised Nested Loop joins. For Hash Join,
> it's been talked about, but this requires running the partition
> pruning code for every values that goes into the Hash Table and only
> scanning the unioned set of those partitions during the hash probe
> phase. The trouble with that is that it's very hard to know in advance
> if it'll be worth the extra effort. Putting a tuple into a hash table
> is quite cheap. Running the pruning code for a range partitioned table
> is likely to be a few times more costly than the hash insert
> (depending on how many partitions there are), so if the end result is
> that nothing was pruned, then that's quite a bit of extra effort for
> no gain.
>
> What we maybe could do better is reduce the cost of the Append scan
> when there's a run-time pruning object attached. This is a little
> tricky as we currently only build that object when creating the final
> plan. To include that in the costs we'd need to move that to the Path
> generation phase so that we didn't accidentally reject Paths which
> could be cheaper than we first think.
>
> > Also, how could I contribute to get this partition pruning to work?
>
> The pgsql-hackers mailing list is where all the discussions about that
> happen. There is plenty of past discussions on these topics. One such
> (fairly) recent discussion is in [1]. There are plenty more, including
> some ideas from Robert Haas about how we might cost run-time partition
> pruning. That was likely around 2017-2018 range, so you might need to
> dig deep to find that.
>
> David
>
> [1]
> https://www.postgresql.org/message-id/flat/CAApHDvoC7n_oceb%3D8z%2BMY8sTgH4xa%2ByAwBxZ4Dxv8pwkT9bOcA%40mail.gmail.com#45314d3d01ef8ad1eebe72111989062c
>
--
*Lauro Ojeda*
From | Date | Subject | |
---|---|---|---|
Next Message | Chetan | 2025-09-29 20:05:27 | Re: Partition pruning is not happening (even in PG18) |
Previous Message | Pavel Stehule | 2025-09-29 10:13:25 | Re: proposal: schema variables |