Re: extended statistics: n-distinct

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, tomas(dot)vondra(at)2ndquadrant(dot)com, dean(dot)a(dot)rasheed(at)gmail(dot)com
Subject: Re: extended statistics: n-distinct
Date: 2017-03-22 21:03:45
Message-ID: 20170322210345.zoqj4tmdyoh23mxm@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:

> At Mon, 20 Mar 2017 16:02:20 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in <20170320190220(dot)ixlaueanxegqd5gr(at)alvherre(dot)pgsql>

> > This is a new thread to present a version of the n-distinct patch that
> > IMO is close enough to commit. There are some work items still.
> > There's some discussion on the topic of cross-column statistics:
> > https://wiki.postgresql.org/wiki/Cross_Columns_Stats
> >
> > This problem is important enough that Kyotaro Horiguchi submitted
> > another patch that does the same thing:
> > https://www.postgresql.org/message-id/flat/20150828.173334.114731693.horiguchi.kyotaro%40lab.ntt.co.jp
> > This patch aims to provide the same functionality, keeping the design
> > general enough that other kinds of statistics can be added later (such
> > as functional dependencies, histograms and MCVs, all of which have been
> > previously submitted as patches by Tomas).
>
> I may be stupid but I don't get the picture here, specifically
> about the relation to Tomas's patch. Does this work as
> infrastructure for Tomas's mv patch? Or in some other
> relationsip?

Well, this patch is Tomas' first patch, which I've reviewed and reworked
-- I changed some things that weren't properly finished, cleaned up the
code, made it all more robust, and made sure the sane cases work sanely
while the others rejected promptly (rather than throwing bogus error
messages at a later time, or crashing).

I didn't review your own n-distinct patch. I don't think there's any
common code, but it would be very useful if you could try your test
scenarios and make sure they are handled sanely by this patch.

Regarding your question:

> Do you planning to realize correcting esitimation of joins
> perplexed by strong correlations?

There is a later patch in Tomas' series which I would like to get to
before PG10 closes, but it's not this patch. It needs to be rebased on
top of this one.

Attached is v30, which includes some more cleanup. Detailed commits can
be seen here:
https://github.com/2ndQuadrant/postgres/commits/dev/mvstats-ndistinct
In particular, this includes code from Tomas to consider mixing
ndistinct estimates from multiple multivariate statistic objects, which
is better than the old approach of only using the estimate when a
perfect match was found. However, I lobotomized Tomas' selfuncs.c code
however and I need to revert that part before pushing -- essentially I
removed examine_variable() processing, which seemed a bit on the
expensive side for what we were doing, but that was a silly mistake.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
extstat-ndistinct-30.patch text/plain 168.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-03-22 21:33:37 Re: increasing the default WAL segment size
Previous Message Elvis Pranskevichus 2017-03-22 21:02:57 Re: [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.