BUG #18834: Query planer is choosing the sub-optimal plan when limit is present

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: michael(at)joincandidhealth(dot)com
Subject: BUG #18834: Query planer is choosing the sub-optimal plan when limit is present
Date: 2025-03-07 16:09:38
Message-ID: 18834-ec88fdda949458d0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18834
Logged by: Michael Stoeckel
Email address: michael(at)joincandidhealth(dot)com
PostgreSQL version: 15.10
Operating system: x86_64-pc-linux-gnu
Description:

We have a table, `fn_ledger`, containing approximately **80 million
records**. The table includes a **primary key (`id`)** of type **UUID** and
a **`provenance`** column of type **JSONB**, which is **GIN indexed**, along
with other fields.
We observed **poor query performance** when using an **`ORDER BY`** clause
combined with **`LIMIT`**, which led us to investigate further.
Consider the following simple query:
```
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id
```
In the query above, the `WHERE` clause is **highly selective**, meaning it
will return at most a handful of records.
Through our investigation, we found that:
- **When `LIMIT` is not present**, the planner correctly chooses to use the
**bitmap index scan** on `ix_fn_ledger_provenance`, resulting in efficient
query execution.
- **When `LIMIT` is introduced**, the planner **incorrectly opts** for a
sequential scan on `fn_ledger_pkey` while filtering on the `provenance`
value, leading to poor performance.

See the query plans below:
```
postgres=> explain analyze
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=32625.36..32645.41 rows=8020 width=333) (actual
time=2.943..2.944 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on fn_ledger (cost=1230.16..32105.29 rows=8020
width=333) (actual time=2.939..2.939 rows=1 loops=1)
Recheck Cond: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on ix_fn_ledger_provenance
(cost=0.00..1228.15 rows=8020 width=0) (actual time=2.923..2.924 rows=1
loops=1)
Index Cond: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Planning Time: 0.115 ms
Execution Time: 2.978 ms
(10 rows)

postgres=> explain analyze
select *
from fn_ledger
where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'
order by id
limit 1;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1816.54 rows=1 width=333) (actual
time=317508.674..317508.675 rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564073.97
rows=8020 width=333) (actual time=317508.673..317508.673 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb)
Rows Removed by Filter: 49197422
Planning Time: 0.148 ms
Execution Time: 317508.742 ms
(6 rows)
```
Furthermore, since the planner incorrectly chooses to perform an **index
scan** on `fn_ledger_pkey`, the query performance varies significantly
depending on **where the candidate record's `id` appears** within the
`fn_ledger_pkey` index.
To illustrate this, I selected the **first and last records** based on the
ordering of the `id` column.
```
select id, provenance->'locator'
from fn_ledger
where id in (
'00000025-fb46-46cd-b7eb-19e111fa1e87',
'ffffffdb-23f6-4a6e-9385-86e5e32a9309'
)
===================
00000025-fb46-46cd-b7eb-19e111fa1e87
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"
ffffffdb-23f6-4a6e-9385-86e5e32a9309
"f2d20507-a543-4c38-b3e3-c977ac407b42"
```
then, I executed the following queries, both with `LIMIT`
```
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 1;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1817.68 rows=1 width=333) (actual time=0.016..0.017
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14478744.69
rows=7968 width=333) (actual time=0.015..0.016 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Planning Time: 0.184 ms
Execution Time: 0.031 ms
(5 rows)

postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"f2d20507-a543-4c38-b3e3-c977ac407b42"}'
order by id
limit 1;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1817.75 rows=1 width=333) (actual
time=57515.333..57515.334 rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14479340.89
rows=7968 width=333) (actual time=57515.331..57515.332 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"f2d20507-a543-4c38-b3e3-c977ac407b42"}'::jsonb)
Rows Removed by Filter: 30847879
Planning Time: 0.147 ms
Execution Time: 57515.418 ms
(6 rows)
```
As you can see, both queries are **identical**, yet the first query is
**orders of magnitude faster** than the second. This difference occurs
because the **candidate record** for the first query appears **towards the
beginning** of the `fn_ledger_pkey` index, allowing it to be found quickly.
In contrast, the candidate record for the second query appears **towards the
end** of the index, requiring significantly more scanning.

Beyond this inconsistency, if the **`LIMIT` value exceeds the actual number
of records found by the query**, the index scan will have to **traverse the
entire index**, regardless of the candidate record’s position. This further
degrades performance.
```
postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 1;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..1816.41 rows=1 width=333) (actual time=0.013..0.014
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49
rows=8021 width=333) (actual time=0.013..0.013 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Planning Time: 0.145 ms
Execution Time: 0.027 ms
(5 rows)

postgres=> explain analyze
select *
from fn_ledger
where fn_ledger.provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'
order by id
limit 2;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..3632.25 rows=2 width=333) (actual time=0.017..126786.189
rows=1 loops=1)
-> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49
rows=8021 width=333) (actual time=0.016..126786.187 rows=1 loops=1)
Filter: (provenance @> '{"locator":
"9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb)
Rows Removed by Filter: 80472490
Planning Time: 0.148 ms
Execution Time: 126786.246 ms
(6 rows)
```
As you can see from the above queries, the only difference is that the
second query's `LIMIT` is 2 instead of 1.

In conclusion, it seems that the query planner is **placing too much
emphasis on the `LIMIT` clause**, prioritizing it over selecting the more
efficient `ix_fn_ledger_provenance` index. As a result, it **ignores the
optimal indexing strategy** and instead opts for a less efficient scan on
`fn_ledger_pkey`, leading to inconsistent and suboptimal query performance.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-03-07 18:00:01 BUG #18835: spgist index fails to accept point with NaN
Previous Message Greg Sabino Mullane 2025-03-07 14:23:40 Re: Window Functions with identical PARTITION BY and ORDER BY clauses evaluated separately