Re: Indexing on JSONB field not working

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Zhihong Zhang <zhihong(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Indexing on JSONB field not working
Date: 2019-12-21 11:40:03
Message-ID: 20191221114003.yo56lv3o6thdvxtn@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
>SET enable_seqscan = off;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
>
>"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
>" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"Planning Time: 1.617 ms"
>"Execution Time: 0.276 ms"
>
>SET enable_seqscan = on;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
>
>"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
>" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"
>" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>" Rows Removed by Filter: 2602824"
>"Planning Time: 0.283 ms"
>"Execution Time: 313501.777 ms"
>

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Joe Conway 2019-12-21 14:14:41 Re: BUG #16176: NULL value returned by category_sql argument to crosstab() causes segmentation fault
Previous Message Thomas Munro 2019-12-21 07:43:05 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash