Re: Partition pruning is not happening (even in PG18)

From: Chetan <chetansuttraway(at)gmail(dot)com>
To: Lauro Ojeda <lauro(dot)ojeda(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql-performance(at)postgresql(dot)org
Subject: Re: Partition pruning is not happening (even in PG18)
Date: 2025-09-29 20:05:27
Message-ID: CAHAB5PakNwwa6kBea__UgfCGNn7NNMm6HpqiuAcMvRxMXOQFyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 29 Sept 2025 at 17:55, Lauro Ojeda <lauro(dot)ojeda(at)gmail(dot)com> wrote:

> Hi David,
> Thank you for your nice reply.
>
> I have the impression there is something heavily penalizing the usage of
> partition pruning. While trying to go a bit further, I realized that
> partition pruning is not occurring because the planner gives the cost of
> seq-scanning all partitions to just over 2M units, but the query takes over
> 3 minutes. When forcing the usage of PK index via hinting (which down the
> line forces the partition pruning to happen), the cost sky-rocket to 53M
> units (however scanning 4 partitions out of 12). Even though with much
> higher cost, the query launches less workers and completes in a third (or
> less) of the time of all partition scans.
>
> My observation is that the cost of seq-scanning each partition is
> estimated in ~124k units, but index-scanning each partition is 440k (say,
> 3x higher). Therefore, scanning 4 partitions, the total cost should be
> ~1.7M units, which is below the 2M units from the cost of all partition
> scanning, thus the planner should prefer the indexed path instead. However
> when analyzing the hinted query (doing index scan and thus partition
> pruning) we can note that the cost estimated to each partition scan was
> added to the final cost, even though 8 out of 12 partitions were not
> scanned (never executed).
>
> postgres=> explain analyze
> /*+ IndexScan(accounts accounts_pkey) */
> select aid, abalance
> from accounts
> where transaction_date in (select dt_col from t2);
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=1.56..53359847.63 rows=2767123 width=8) (actual
> time=2.299..111346.649 rows=1664742 loops=1)
> -> HashAggregate (cost=1.12..1.23 rows=10 width=4) (actual
> time=1.060..40.562 rows=6 loops=1)
> Group Key: t2.dt_col
> Batches: 1 Memory Usage: 24kB
> -> Seq Scan on t2 (cost=0.00..1.10 rows=10 width=4) (actual
> time=1.048..1.050 rows=6 loops=1)
> -> Append (cost=0.43..5302777.61 rows=3320703 width=12) (actual
> time=1.554..18514.181 rows=277457 loops=6)
> -> Index Scan using accounts_p1_pkey on accounts_p1 accounts_1
> (cost=0.43..434635.82 rows=273011 width=12) (actual
> time=2.214..17627.113 rows=276971 loops=1)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p2_pkey on accounts_p2 accounts_2
> (cost=0.43..441097.44 rows=277283 width=12) (actual
> time=2.185..16280.020 rows=276634 loops=1)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p3_pkey on accounts_p3 accounts_3
> (cost=0.43..441666.94 rows=277392 width=12) (actual
> time=0.751..25992.220 rows=278390 loops=1)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p4_pkey on accounts_p4 accounts_4
> (cost=0.43..441689.70 rows=277483 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p5_pkey on accounts_p5 accounts_5
> (cost=0.43..441883.26 rows=277546 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p6_pkey on accounts_p6 accounts_6
> (cost=0.43..*441857.26 *rows=277591 width=12) (actual
> time=1.376..17001.777 rows=277582 loops=3)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p7_pkey on accounts_p7 accounts_7
> (cost=0.43..441837.57 rows=277503 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p8_pkey on accounts_p8 accounts_8
> (cost=0.43..441843.95 rows=277511 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p9_pkey on accounts_p9 accounts_9
> (cost=0.43..441711.03 rows=277506 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p10_pkey on accounts_p10
> accounts_10 (cost=0.43..441918.96 rows=277554 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p11_pkey on accounts_p11
> accounts_11 (cost=0.43..441501.86 rows=277377 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> -> Index Scan using accounts_p12_pkey on accounts_p12
> accounts_12 (cost=0.43..434530.31 rows=272946 width=12) (never executed)
> Index Cond: (transaction_date = t2.dt_col)
> Planning Time: 0.879 ms
> Execution Time: 111432.755 ms
> (32 rows)
>
> By looking into it, I have the impression there is a bug in the costing
> sum in that situation, where the cost of the "never executed" partitions
> should be deducted from the final cost estimation, which would make pruning
> to be the preferred option in this case.
>
>
The plan generation stage is where the costing sum is figured out whereas
the "never executed" marking/detection is in the execution stage.
So this doesnt look to be a costing bug for the above scenario where the
planner was unable to prune partitions early.

> Are my assumptions correct?
> Is there anything I could do to influence the planner to dismiss the cost
> of "never executed" scans?
>
> Kind regards,
> Lauro Ojeda
>
>
> On Thu, 25 Sept 2025 at 22:18, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
>> On Fri, 26 Sept 2025 at 07:49, Lauro Ojeda <lauro(dot)ojeda(at)gmail(dot)com> wrote:
>> > The only way I found to make pruning work is to force index_scan using
>> pg_hint_plan, but I wanted to influence the planner to decide it by itself
>> rather than relying on hints. What's the reason for this misbehaving and
>> what could I do to overcome it?
>>
>> > Partition key: RANGE (transaction_date)
>>
>> > postgres=> explain analyze
>> > select aid, abalance
>> > from accounts
>> > where transaction_date in (select dt_col from t2);
>>
>> The only partition pruning that exists in PostgreSQL that can prune
>> for that query is for parameterised Nested Loop joins. For Hash Join,
>> it's been talked about, but this requires running the partition
>> pruning code for every values that goes into the Hash Table and only
>> scanning the unioned set of those partitions during the hash probe
>> phase. The trouble with that is that it's very hard to know in advance
>> if it'll be worth the extra effort. Putting a tuple into a hash table
>> is quite cheap. Running the pruning code for a range partitioned table
>> is likely to be a few times more costly than the hash insert
>> (depending on how many partitions there are), so if the end result is
>> that nothing was pruned, then that's quite a bit of extra effort for
>> no gain.
>>
>> What we maybe could do better is reduce the cost of the Append scan
>> when there's a run-time pruning object attached. This is a little
>> tricky as we currently only build that object when creating the final
>> plan. To include that in the costs we'd need to move that to the Path
>> generation phase so that we didn't accidentally reject Paths which
>> could be cheaper than we first think.
>>
>> > Also, how could I contribute to get this partition pruning to work?
>>
>> The pgsql-hackers mailing list is where all the discussions about that
>> happen. There is plenty of past discussions on these topics. One such
>> (fairly) recent discussion is in [1]. There are plenty more, including
>> some ideas from Robert Haas about how we might cost run-time partition
>> pruning. That was likely around 2017-2018 range, so you might need to
>> dig deep to find that.
>>
>> David
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/CAApHDvoC7n_oceb%3D8z%2BMY8sTgH4xa%2ByAwBxZ4Dxv8pwkT9bOcA%40mail.gmail.com#45314d3d01ef8ad1eebe72111989062c
>>
>
>
> --
> *Lauro Ojeda*
>

--
--
Regards,
Chetan

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2025-09-29 20:08:16 Re: Partition pruning is not happening (even in PG18)
Previous Message Lauro Ojeda 2025-09-29 13:49:31 Re: Partition pruning is not happening (even in PG18)