Re: Selectivity and row count estimates for JSONB columns

From: Joel Perren <joel(dot)perren(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Selectivity and row count estimates for JSONB columns
Date: 2021-10-20 16:08:01
Message-ID: CACFz3n3MRnyKa_hMBxYPRnAa-cFv67AwhjDwEB2mzicF4F+RKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

After some further investigation I think I might have actually got this the
wrong way round. I now strongly suspect that what I am seeing is related to
the presence of histogram_bounds for Table A (possibly a bug? or me being
silly).

Looking into the code (
https://github.com/postgres/postgres/blob/8c1144ba73478b818d9cebe8ecd64a14b7d45bde/src/backend/utils/adt/selfuncs.c
):
- The matchingsel() function calls the generic_restriction_selectivity()
function
- generic_restriction_selectivity() returns the default value (0.01) for
data types that Postgres doesn't collect standard MCV and/or histogram
statistics for. I think this is what happens with Table B which (quite
correctly) does not have these statistics in pg_stats
- Table A does have histogram_bounds for some reason, so I suspect it gets
further through the function, operating on garbage, to this line:

if (selec < 0.0001)
selec = 0.0001;

which ultimate ends up getting returned.

Therefore, Table A gets the 'correct' default value of 0.01 (which is
actually 1%, silly me). Table B meanwhile gets an erroneous value of 0.0001
(0.1%) .

So my question now is: why is Postgres creating histogram_bounds on some
jsonb columns but not others? I've looked at my CREATE TABLE scripts and
they are identical. We have the database deployed in another environment
and this issue reoccurs there too.

Any help would be appreciated :)

Thanks
Joel

On Wed, Oct 20, 2021 at 3:56 PM Joel Perren <joel(dot)perren(at)gmail(dot)com> wrote:

> Hi all
>
> I have a Postgres/PostGIS database with two separate database tables each
> of which has the following column structure:
>
> - identifier (text)
> - geometry (geometryz,27700)
> - properties (jsonb)
>
> I have created a GIN index (jsonb_path_ops) over the properties column on
> both tables.
>
> As I understand it, Postgres is unable to calculate statistics on the
> contents of JSONB columns and so should use hard-coded estimates of
> selectivity when planning queries. However, despite both tables having
> identical structures, similar row counts, identical indexes, and both
> having been recently ANALYZEd, I am finding that the query planner is
> producing different row count estimates between the tables.
>
> To expand: both tables have a 'description' field within their JSONB
> columns. I am executing the following query:
>
> EXPLAIN
> SELECT identifier, geometry, properties FROM table
> WHERE properties @@ '$.description == "test"'
> ORDER BY identifier;
>
> I am expecting that the selectivity value used for such queries should be
> 0.010 which is the default result of the matchingsel selectivity function
> which the @@ operator uses when operating on jsonb.
>
> For both tables, the planner opts for a Bitmap Index Scan -> Sort ->
> Gather Merge. However, the estimated number of rows returned from the Index
> Scan node differs substantially.
>
> - Table A (1,611,752 rows): 159 estimated rows (this is roughly 0.01% and
> makes sense);
> - Table B (1,656,110 rows): 16566 estimated rows (roughly 1% - why??)
>
> This difference is causing the planner to come up with some strange plans
> for queries on Table B which it does not do for Table A. I know that both
> estimates are just that and that neither is 'better', but I would really
> prefer it to be consistent and use a selectivity of 0.01 rather than 0.1 as
> it should be doing.
>
> I know this is getting long now, so apologies. But one thing I did notice
> and wondered if it could be related is that the pg_stats table for Table A
> seems to have values for histogram_bounds and correlation, but nulls for
> these values for Table B.
>
> Any ideas?
>
> Thanks
> Joel
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Radoslav Nedyalkov 2021-10-20 20:29:20 pg_class.oid at 4B
Previous Message Vijaykumar Jain 2021-10-20 15:43:13 Re: Can db user change own password?