Re: Indexing on JSONB field not working

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Zhihong Zhang <zhihong(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Indexing on JSONB field not working
Date: 2019-12-21 15:17:26
Message-ID: CAMkU=1z8X1s0WChsgKU7SN8ngrLKHt38UhAqKn-nGTf+xbQ9zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Dec 21, 2019 at 7:00 AM Zhihong Zhang <zhihong(at)gmail(dot)com> wrote:

> Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
> SET enable_seqscan = off;
> explain analyze select id, _doc->>'floatValue' from assets where (_doc #>
> '{floatValue}'::text[])::double precision < 3.0 limit 100;
>
> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236
> rows=7 loops=1)"
> " -> Index Scan using assets_floatvalue_idx on assets
> (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230
> rows=7 loops=1)"
> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision
> < '3'::double precision)"
> "Planning Time: 1.617 ms"
> "Execution Time: 0.276 ms"
>

313 is a pretty high estimate for fetching an estimated 100 rows. It must
think that nearly every row fetched from the table is going to be a random
page fetch. Which means that it must think the correlation between
assets_floatvalue_idx and physical table order is close to 0.

>
> SET enable_seqscan = on;
> explain analyze select id, _doc->>'floatValue' from assets where (_doc #>
> '{floatValue}'::text[])::double precision < 3.0 limit 100;
>
> "Limit (cost=0.00..107.95 rows=100 width=53) (actual
> time=41021.311..313501.746 rows=7 loops=1)"
> " -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)
> (actual time=41021.309..313501.732 rows=7 loops=1)"
> " Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
> '3'::double precision)"
> " Rows Removed by Filter: 2602824"
> "Planning Time: 0.283 ms"
> "Execution Time: 313501.777 ms"
>
>
It thinks it will find 867607 rows which meet the <3.0 condition, but
really it only finds 7. It has to scan the full table, because with only 7
rows it can never stop early due to the LIMIT 100. Why is the estimate
wrong by a factor of over 100,000? It should be using the statistics from
the expression index here (even though it is not using the index during
execution), and so should have pretty good statistics.

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the
screen, not across. In psql, that would toggled on with \x.

Cheers,

Jeff

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-12-21 15:49:16 Re: Indexing on JSONB field not working
Previous Message Tom Lane 2019-12-21 15:08:37 Re: BUG #16176: NULL value returned by category_sql argument to crosstab() causes segmentation fault