| From: | Илья Сербин <serbin(dot)ilia(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | SELECT over partitioned table with LIMIT 1 performance regression issue in PostgreSQL 17 and 18 |
| Date: | 2026-04-06 10:34:45 |
| Message-ID: | CAKS1CiSrRPDcR3YQkf95g-SY6y3TbhO2K4tgdGWbN-YYr0xivg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello,
I think I may have found a planner regression, or at least a planner corner
case, in PostgreSQL 17 and 18 involving a partitioned table, LIMIT 1, and
OR-ed timestamp ranges.
I could not find an existing report that matches this pattern closely, so I
am sending a reproducible example and the observations below to discuss
whether or not this behaviour is appropriate and expected.
Environment
-----------
Observed on:
- PostgreSQL 17.9 on Ubuntu 24.04
- PostgreSQL 15.x and 16.x do not show the same bad parent-level plan for
the same query pattern
Workload pattern
----------------
- A table partitioned by RANGE(ts_col), with weekly partitions
- B-tree indexes on ts_col and kind_col
- ts_col is highly correlated with heap order
- kind_col = 'k_a' is very common (about 81%)
- the query uses LIMIT 1 without ORDER BY
- the filter is:
kind_col IN (...)
AND (ts_col in range1 OR ts_col in range2)
Problem summary
---------------
On PostgreSQL 17, the query on the partitioned parent table may choose a
Seq Scan on the child partition instead of a ts_col-based bitmap/index
path, even though the ts_col-based path is much faster when forced.
What looks suspicious to me is this:
1. PostgreSQL 17 can use the ts_col index efficiently for a very similar
query when the OR condition is removed and only one timestamp range is used.
2. PostgreSQL 17 can also execute a forced bitmap path on ts_col very
quickly for the same child partition.
3. But for the parent-table query with OR-ed ranges and LIMIT 1, PostgreSQL
17 may choose a Seq Scan with much worse runtime.
There is also an important difference between PostgreSQL 16 and 17 here:
- On PostgreSQL 16, the child partition by itself may still choose Seq Scan
for this LIMIT 1 pattern, but the query on the partitioned parent chooses a
Bitmap-based Append plan and runs fast.
- On PostgreSQL 17, the parent-level plan no longer does that in this case
and regresses to the poor plan shape.
Representative observations
---------------------------
For the parent query on PostgreSQL 17:
EXPLAIN (ANALYZE, BUFFERS)
SELECT 1
FROM fact_events
WHERE kind_col IN (
'k_a', 'k_b', 'k_c', 'k_d', 'k_e'
)
AND (
(ts_col >= '2026-03-28 07:42:30.405987' AND ts_col < '2026-03-28
08:00:00')
OR (ts_col >= '2026-03-31 12:00:00' AND ts_col < '2026-03-31
12:10:25.420965')
)
LIMIT 1;
Observed bad plan:
- Seq Scan on the child partition under Append
- runtime around 12 seconds on our production-sized dataset
- Rows Removed by Filter: about 33 million
On the same PostgreSQL 17 instance:
A) Parent query with only one timestamp range:
EXPLAIN (ANALYZE, BUFFERS)
SELECT 1
FROM fact_events
WHERE kind_col IN (
'k_a', 'k_b', 'k_c', 'k_d', 'k_e'
)
AND ts_col >= '2026-03-28 07:42:30.405987'
AND ts_col < '2026-03-28 08:00:00'
LIMIT 1;
This uses the ts_col index and returns very quickly.
B) Forced bitmap path on the child partition:
SET enable_seqscan = off;
SET enable_indexscan = off;
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT 1
FROM fact_events_p2026w13
WHERE kind_col IN (
'k_a', 'k_b', 'k_c', 'k_d', 'k_e'
)
AND (
(ts_col >= '2026-03-28 07:42:30.405987' AND ts_col < '2026-03-28
08:00:00')
OR (ts_col >= '2026-03-31 12:00:00' AND ts_col < '2026-03-31
12:10:25.420965')
)
LIMIT 1;
This uses BitmapOr + Bitmap Heap Scan via ts_col_idx and also returns
quickly.
C) Default plan for the child partition alone:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT 1
FROM fact_events_p2026w13
WHERE kind_col IN (
'k_a', 'k_b', 'k_c', 'k_d', 'k_e'
)
AND (
(ts_col >= '2026-03-28 07:42:30.405987' AND ts_col < '2026-03-28
08:00:00')
OR (ts_col >= '2026-03-31 12:00:00' AND ts_col < '2026-03-31
12:10:25.420965')
)
LIMIT 1;
This can still choose Seq Scan on the child relation itself, but on
PostgreSQL 16 the parent-level Append plan is still good, while on
PostgreSQL 17 the parent-level plan may also fall back to the poor plan
shape.
Why this looks suspicious
-------------------------
- The ts_col-based path is clearly valid and fast.
- This is not just a matter of missing statistics or tuning. I tried
extended statistics, random_page_cost, effective_cache_size, and
combinations of these, and they did not fix the bad parent-level plan.
- I can reproduce the issue not only on a tuned production-like setup, but
also on plain PostgreSQL containers with no special tuning.
- The issue looks related to parent/Append planning for LIMIT 1 on a
partitioned table with OR-ed ranges.
- PostgreSQL 17 release notes mention planner changes in this area: LIMIT
optimization on partitioned tables, inheritance parents, and UNION ALL.
Commit that looks potentially relevant is a8a968a82 (Consider cheap startup
paths in add_paths_to_append_rel, PG17), which introduced startup-path
selection for Append nodes. I have not confirmed this is the root cause,
but the behavior and timing match: if Seq Scan wins as
cheapest_startup_path because BitmapOr has higher startup cost, the
parent-level Append would pick the wrong path for LIMIT queries.
My guess is that the planner is underestimating how bad the Seq Scan really
is for LIMIT 1 in this case. Since ts_col is very highly correlated with
heap order, the matching rows are physically located close to the end of
the partition, so the Seq Scan ends up reading a very large part of the
table before it finds the first match.
Testing
------------
I am attaching two SQL files:
- reproduce_preparation_env.sql
This file creates the schema, partitions, indexes, and loads the test data.
- reproduce.sql
This file runs the actual query tests and prints short headings and
comments before each step, so the output is easier to read.
The tests were performed on four clean Docker containers running PostgreSQL
15, 16, 17, 18
Notes
-----
- Exact runtimes will depend on hardware and cache state, but the plan
change itself is reproducible. I'm using SET synchronize_seqscans=off; for
more deterministic behaviour of the tests
- In the attached reproduction scripts, the generated row counts are
already large enough to show the difference.
- In my production case, correlation(ts_col) is about 0.999 and kind_col =
'k_a' is about 81%.
Thanks.
| Attachment | Content-Type | Size |
|---|---|---|
| reproduce_preparation_env.sql | application/octet-stream | 4.0 KB |
| reproduce.sql | application/octet-stream | 6.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Suraj Kharage | 2026-04-06 10:43:36 | Re: [PATCH] Add support for INSERT ... SET syntax |
| Previous Message | Mihail Nikalayeu | 2026-04-06 10:14:57 | Re: Adding REPACK [concurrently] |