Quick Links

Re: PoC/WIP: Extended statistics on expressions

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PoC/WIP: Extended statistics on expressions
Date:	2021-03-18 07:54:38
Message-ID:	CAEZATCWgZPJGbj2ndXsoD6_MuJ=H1Y-gdTMdkE=n1VsA6WX+RA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 17 Mar 2021 at 21:31, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> I agree applying at least the [(a+b),c] stats is probably the right
> approach, as it means we're considering at least the available
> information about dependence between the columns.
>
> I think to improve this, we'll need to teach the code to use overlapping
> statistics, a bit like conditional probability. In this case we might do
> something like this:
>
> ndistinct((a+b),c) * (ndistinct((c+d)) / ndistinct(c))

Yes, I was thinking the same thing. That would be equivalent to
applying a multiplicative "correction" factor of

ndistinct(a,b,c,...) / ( ndistinct(a) * ndistinct(b) * ndistinct(c) * ... )

for each multivariate stat applicable to more than one
column/expression, regardless of whether those columns were already
covered by other multivariate stats. That might well simplify the
implementation, as well as probably produce better estimates.

> But that's clearly a matter for a future patch, and I'm sure there are
> cases where this will produce worse estimates.

Agreed.

> Anyway, I plan to go over the patches one more time, and start pushing
> them sometime early next week. I don't want to leave it until the very
> last moment in the CF.

+1. I think they're in good enough shape for that process to start.

Regards,
Dean

In response to

Re: PoC/WIP: Extended statistics on expressions at 2021-03-17 21:30:59 from Tomas Vondra

Responses

Re: PoC/WIP: Extended statistics on expressions at 2021-03-24 00:51:09 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2021-03-18 07:56:02	Re: shared-memory based stats collector
Previous Message	Paul Guo	2021-03-18 07:52:29	Re: fdatasync performance problem with large number of DB files