From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | michael(at)joincandidhealth(dot)com |
Subject: | BUG #18834: Query planer is choosing the sub-optimal plan when limit is present |
Date: | 2025-03-07 16:09:38 |
Message-ID: | 18834-ec88fdda949458d0@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 18834
Logged by: Michael Stoeckel
Email address: michael(at)joincandidhealth(dot)com
PostgreSQL version: 15.10
Operating system: x86_64-pc-linux-gnu
Description:
We have a table, `fn_ledger`, containing approximately **80 million
records**. The table includes a **primary key (`id`)** of type **UUID** and
a **`provenance`** column of type **JSONB**, which is **GIN indexed**, along
with other fields.
We observed **poor query performance** when using an **`ORDER BY`** clause
combined with **`LIMIT`**, which led us to investigate further.
Consider the following simple query:
```
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id
```
In the query above, the `WHERE` clause is **highly selective**, meaning it
will return at most a handful of records.
Through our investigation, we found that:
- **When `LIMIT` is not present**, the planner correctly chooses to use the
**bitmap index scan** on `ix_fn_ledger_provenance`, resulting in efficient
query execution.
- **When `LIMIT` is introduced**, the planner **incorrectly opts** for a
sequential scan on `fn_ledger_pkey` while filtering on the `provenance`
value, leading to poor performance.
See the query plans below:
```
postgres=> explain analyze
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=32625.36..32645.41 rows=8020 width=333) (actual
time=2.943..2.944 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on fn_ledger (cost=1230.16..32105.29 rows=8020
width=333) (actual time=2.939..2.939 rows=1 loops=1)
Recheck Cond: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on ix_fn_ledger_provenance
(cost=0.00..1228.15 rows=8020 width=0) (actual time=2.923..2.924 rows=1
loops=1)
Index Cond: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Planning Time: 0.115 ms
Execution Time: 2.978 ms
(10 rows)
postgres=> explain analyze
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id
limit 1;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1816.54 rows=1 width=333) (actual
time=317508.674..317508.675 rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564073.97
rows=8020 width=333) (actual time=317508.673..317508.673 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Rows Removed by Filter: 49197422
Planning Time: 0.148 ms
Execution Time: 317508.742 ms
(6 rows)
```
Furthermore, since the planner incorrectly chooses to perform an **index
scan** on `fn_ledger_pkey`, the query performance varies significantly
depending on **where the candidate record's `id` appears** within the
`fn_ledger_pkey` index.
To illustrate this, I selected the **first and last records** based on the
ordering of the `id` column.
```
select id, provenance->'locator'
from fn_ledger
where id in (
'00000025-fb46-46cd-b7eb-19e111fa1e87',
'ffffffdb-23f6-4a6e-9385-86e5e32a9309'
)
===================
00000025-fb46-46cd-b7eb-19e111fa1e87
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"
ffffffdb-23f6-4a6e-9385-86e5e32a9309
"f2d20507-a543-4c38-b3e3-c977ac407b42"
```
then, I executed the following queries, both with `LIMIT`
```
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1817.68 rows=1 width=333) (actual time=0.016..0.017
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14478744.69
rows=7968 width=333) (actual time=0.015..0.016 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Planning Time: 0.184 ms
Execution Time: 0.031 ms
(5 rows)
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"f2d20507-a543-4c38-b3e3-c977ac407b42"}'
order by id
limit 1;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1817.75 rows=1 width=333) (actual
time=57515.333..57515.334 rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14479340.89
rows=7968 width=333) (actual time=57515.331..57515.332 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"f2d20507-a543-4c38-b3e3-c977ac407b42"}'::jsonb)
Rows Removed by Filter: 30847879
Planning Time: 0.147 ms
Execution Time: 57515.418 ms
(6 rows)
```
As you can see, both queries are **identical**, yet the first query is
**orders of magnitude faster** than the second. This difference occurs
because the **candidate record** for the first query appears **towards the
beginning** of the `fn_ledger_pkey` index, allowing it to be found quickly.
In contrast, the candidate record for the second query appears **towards the
end** of the index, requiring significantly more scanning.
Beyond this inconsistency, if the **`LIMIT` value exceeds the actual number
of records found by the query**, the index scan will have to **traverse the
entire index**, regardless of the candidate record’s position. This further
degrades performance.
```
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1816.41 rows=1 width=333) (actual time=0.013..0.014
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49
rows=8021 width=333) (actual time=0.013..0.013 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Planning Time: 0.145 ms
Execution Time: 0.027 ms
(5 rows)
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 2;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..3632.25 rows=2 width=333) (actual time=0.017..126786.189
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49
rows=8021 width=333) (actual time=0.016..126786.187 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Rows Removed by Filter: 80472490
Planning Time: 0.148 ms
Execution Time: 126786.246 ms
(6 rows)
```
As you can see from the above queries, the only difference is that the
second query's `LIMIT` is 2 instead of 1.
In conclusion, it seems that the query planner is **placing too much
emphasis on the `LIMIT` clause**, prioritizing it over selecting the more
efficient `ix_fn_ledger_provenance` index. As a result, it **ignores the
optimal indexing strategy** and instead opts for a less efficient scan on
`fn_ledger_pkey`, leading to inconsistent and suboptimal query performance.
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2025-03-07 18:00:01 | BUG #18835: spgist index fails to accept point with NaN |
Previous Message | Greg Sabino Mullane | 2025-03-07 14:23:40 | Re: Window Functions with identical PARTITION BY and ORDER BY clauses evaluated separately |