Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2020-11-16 15:27:46
Message-ID: 3fbbd68b-0b52-cb5a-d468-7e7280c33e6c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/16/20 2:49 PM, Tomas Vondra wrote:
> Hi,
>
> ...
>
> 4) apply the statistics
>
> This is the hard part, really, and the exact state of the support
> depends on type of statistics.
>
> For ndistinct coefficients, it generally works. I'm sure there may be
> bugs in estimate_num_groups, etc. but in principle it works.
>
> For MCV lists, it generally works too - you can define statistics on
> the expressions and the estimates should improve. The main downside
> here is that it requires at least two expressions, otherwise we can't
> build/apply the extended statistics. So for example
>
> SELECT * FROM t WHERE mod(a,100) = 10 AND mod(b,11) = 0
>
> may be estimated "correctly", once you drop any of the conditions it
> gets much worse as we don't have stats for individual expressions.
> That's rather annoying - it does not break the extended MCV, but the
> behavior will certainly cause confusion.
>
> For functional dependencies, the estimation does not work yet. Also,
> the missing per-column statistics have bigger impact than on MCV,
> because while MCV can work fine without it, the dependencies heavily
> rely on the per-column estimates. We only apply "corrections" based
> on the dependency degree, so we still need (good) per-column
> estimates, which does not quite work with the expressions.
>
>
> Of course, the lack of per-expression statistics may be somewhat
> fixed by adding indexes on expressions, but that's kinda expensive.
>

FWIW after re-reading [1], I think the plan to build pg_statistic rows
for expressions and stash them in pg_statistic_ext_data is the way to
go. I was thinking that maybe we'll need some new statistics type to
request this, e.g.

CREATE STATISTICS s (expressions) ON ...

but on second thought I think we should just build this whenever there
are expressions in the definition. It'll require some changes (e.g. we
require at least two items in the list, but we'll want to allow building
stats on a single expression too, I guess), but that's doable.

Of course, we don't have any catalogs with composite types yet, so it's
not 100% sure this will work, but it's worth a try.

regards

[1]
https://www.postgresql.org/message-id/flat/6331.1579041473%40sss.pgh.pa.us#5ec6af7583e84cef2ca6a9e8a713511e

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-11-16 15:32:23 Re: Multi Inserts in CREATE TABLE AS - revived patch
Previous Message Bruce Momjian 2020-11-16 15:26:37 Re: Add important info about ANALYZE after create Functional Index