Re: Indexing on JSONB field not working

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Zhihong Zhang <zhihong(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Indexing on JSONB field not working
Date: 2019-12-21 15:49:16
Message-ID: 21855.1576943356@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
>> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

> Well, this confirms what I suspected before - the optimizer believes the
> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
> that mismatches the actual performance. The question is why ...

The planner evidently believes that 867607 rows will match the query
condition, so it expects that the scan will stop (after collecting
100 rows) very quickly. In reality only 7 rows match, so the scan
has to run to completion. This is what's bollixing the plan choice.

I suspect that 867607 is just a default estimate, but if ANALYZE has
been run then there should be stats for the index column, so why isn't
it doing better? When I try a similar case here, I get good estimates:

regression=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
(1 row)

regression=# create table public.assets(_doc jsonb);
CREATE TABLE
regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
CREATE INDEX
regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
INSERT 0 10000
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=0.00..7.71 rows=100 width=32)
-> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32)
Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

regression=# analyze assets;
ANALYZE
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.29..8.33 rows=2 width=32)
-> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

The "3808" estimate is just a default for '<' with no stats, but
with stats I get a dead-on estimate.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2019-12-21 18:00:06 Re: BUG #16151: startup timing problem
Previous Message Jeff Janes 2019-12-21 15:17:26 Re: Indexing on JSONB field not working