Re: Indexing on JSONB field not working

From: Zhihong Zhang <zhihong(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Indexing on JSONB field not working
Date: 2019-12-26 15:17:51
Message-ID: 0AA828F3-5BA5-48CD-B229-8351ACB308D3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong?

If it’s a bug, can I have a tracking or ticket number?

Thanks!

Zhihong

> On Dec 21, 2019, at 10:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>>> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>>> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
>>> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>
>> Well, this confirms what I suspected before - the optimizer believes the
>> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
>> that mismatches the actual performance. The question is why ...
>
> The planner evidently believes that 867607 rows will match the query
> condition, so it expects that the scan will stop (after collecting
> 100 rows) very quickly. In reality only 7 rows match, so the scan
> has to run to completion. This is what's bollixing the plan choice.
>
> I suspect that 867607 is just a default estimate, but if ANALYZE has
> been run then there should be stats for the index column, so why isn't
> it doing better? When I try a similar case here, I get good estimates:
>
> regression=# select version();
> version
> ---------------------------------------------------------------------------------------------------------
> PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
> (1 row)
>
> regression=# create table public.assets(_doc jsonb);
> CREATE TABLE
> regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
> CREATE INDEX
> regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
> INSERT 0 10000
> regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------
> Limit (cost=0.00..7.71 rows=100 width=32)
> -> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32)
> Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
> (3 rows)
>
> regression=# analyze assets;
> ANALYZE
> regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------
> Limit (cost=0.29..8.33 rows=2 width=32)
> -> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32)
> Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
> (3 rows)
>
> The "3808" estimate is just a default for '<' with no stats, but
> with stats I get a dead-on estimate.
>
> regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-12-26 15:27:27 Re: Indexing on JSONB field not working
Previous Message Mahendra Singh 2019-12-26 14:50:14 Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema