Re: Selectivity and row count estimates for JSONB columns

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joel Perren <joel(dot)perren(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Selectivity and row count estimates for JSONB columns
Date: 2021-11-03 19:24:10
Message-ID: 144197.1635967450@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Joel Perren <joel(dot)perren(at)gmail(dot)com> writes:
> - generic_restriction_selectivity() returns the default value (0.01) for
> data types that Postgres doesn't collect standard MCV and/or histogram
> statistics for. I think this is what happens with Table B which (quite
> correctly) does not have these statistics in pg_stats

There's nothing "correct" about that. JSONB does have sorting support
(admittedly with a pretty arbitrary sort order), so I'd expect ANALYZE
to collect a histogram as well as MCV values for it. Perhaps for your
Table B it's omitting the histogram because the MCV list captures the
entire contents of the column? Or you've got auto-analyze disabled
for some reason?

Anyway, assuming you're using v14, the idea of matchingsel() is to
apply the given restriction clause to all the MCV and histogram
entries to see how many satisfy it [1]. In principle this'll produce
a far better estimate than any fixed default could provide. If
you're coming out with a crappy estimate, you might be able to
improve matters by increasing the column's statistics target so
that more MCV and histogram entries are collected.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/12237.1582833074%40sss.pgh.pa.us

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2021-11-03 19:37:38 Re: to_date() and to_timestamp() with negative years
Previous Message Bryn Llewellyn 2021-11-03 18:18:03 Re: to_date() and to_timestamp() with negative years