| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | zheng_xianghang(at)163(dot)com |
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19443: GIN jsonb_path_ops index not used when many rows contain nested null values (PostgreSQL 18.3) |
| Date: | 2026-03-30 19:03:46 |
| Message-ID: | 2546010.1774897426@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When a jsonb column has a high proportion of rows with {"a": null}, and a
> small
> subset with valid nested paths like {"a":{"b":{"c":1}}}, the query planner
> chooses a sequential scan instead of using a jsonb_path_ops GIN index for
> the
> path query @? '$.a.b.c' LIMIT 10.
> This leads to unexpected performance degradation.
If I force the plan selection with enable_seqscan, the query gets
slower, and not by a small amount:
regression=# EXPLAIN (ANALYZE)
SELECT * FROM test_jsonb
WHERE data @? '$.a.b.c'
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.91 rows=10 width=22) (actual time=0.009..0.024 rows=10 loops=1)
-> Seq Scan on test_jsonb (cost=0.00..1938.00 rows=10130 width=22) (actual time=0.008..0.022 rows=10 loops=1)
Filter: (data @? '$."a"."b"."c"'::jsonpath)
Rows Removed by Filter: 75
Planning Time: 0.060 ms
Execution Time: 0.036 ms
(6 rows)
regression=# set enable_seqscan TO 0;
SET
regression=# EXPLAIN (ANALYZE)
SELECT * FROM test_jsonb
WHERE data @? '$.a.b.c'
LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=150.51..151.31 rows=10 width=22) (actual time=4.963..4.979 rows=10 loops=1)
-> Bitmap Heap Scan on test_jsonb (cost=150.51..965.13 rows=10130 width=22) (actual time=4.963..4.977 rows=10 loops=1)
Recheck Cond: (data @? '$."a"."b"."c"'::jsonpath)
Rows Removed by Index Recheck: 75
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_test_jsonb (cost=0.00..147.98 rows=10130 width=0) (actual time=4.900..4.900 rows=100000 loops=1)
Index Cond: (data @? '$."a"."b"."c"'::jsonpath)
Planning Time: 0.122 ms
Execution Time: 4.998 ms
(9 rows)
So it appears to me that the planner's decision was correct.
It looks like the index is unable to distinguish real matches
from false matches in this example, so it returns all 100000
rows and the bitmap heap scan has to filter most of them away.
So that's why it's slow. I'm not sure offhand if the planner's
cost model is accurately modeling this situation or if it
just got lucky ... but in any case, you've not demonstrated
that there's a planner bug.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Justin Christensen | 2026-03-30 22:27:04 | Surprising behavior with pushing predicates down into a view |
| Previous Message | Tom Lane | 2026-03-30 18:51:08 | Re: BUG #19446: Domain DEFAULT not reflected in system catalogs and information_schema (PG 18.3) |