Re: Multi-column distinctness.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: simon(at)2ndQuadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-column distinctness.
Date: 2015-09-07 02:58:39
Message-ID: 20150907.115839.146998043.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Thank you for pointing that. It is one crucial point of this
patch. Sorry for not mentioning on the point.

At Sun, 6 Sep 2015 09:24:48 +0100, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote in <CANP8+j+F+DzrzCW1cjK1Up009TAytYN=P_DNsJ4OZEUJEXywjA(at)mail(dot)gmail(dot)com>
> > Tomas Vondra is now working on heavily-equipped multivariate
> > statistics for OLAP usage. In contrast, this is a lightly
> > implemented solution which calculates only the ratio between a
> > rows estimated by current method and a actual row number. I think
> > this doesn't conflict with his work except the grammar part.
> >
>
> I think it very obviously does conflict, so I don't see this patch as
> appropriate.
>
> If you think a cut version of Tomas' patch is appropriate, then the usual
> response is to give a review that says "Tomas, I think a cut down version
> is appropriate here, can we reduce the scope of this patch for now?". If
> you have done that and he refuses to listen, then a separate patch version
> is appropriate. Otherwise we should just reject this second patchset to
> avoid confusion and to avoid encouraging people to take this approach.

You are absolutely right generally and I agree if this is 'a cut
version' of Tomas's patch. I might have wrong concept about size
of a piece of work.

I will discontinue this patch if Tomas and/or Simon, or many
think this as inappropriate to be brought up now (or ever after)
after reading the following explanation.

======
I already asked Tomas to *add* this feature in his patch and got
a reply that it will be after the completion of undergoing work.

Tomas's patch and mine are generally aiming similar objective but
as discussed with Tomas I understood there's some crucial
differences between them.

He is considering more precise and widely-applicable estimation
baesd on a firm theoretical basis, and as described in the
ciation above, it is aiming OLAP usage and allowing rather
complex calculation. It will be reduced through future
discussions but the priority to do so is not so high for now.

Although Tomas's patch is very complex and needs more labor to
complete, resolving the wrong prediction caused by multicolumn
correlation (especially on OLTP usage) is demanded. So I tried a
patch that suit the objective. It has only rough distinctness
coefficient and doesn't have MV-MCV, MV-HISTOGRAM and strict
functional dependency but needs quire small storage and less
calculation.

The two are so different in concrete objective and
characteristics and does not have common functional piece exept
grammer part so I concluded that this patch doesn't break the
foothold (standpoint?) of Toamas's patch and we can continue to
work on it as we did until now.

This is why I think this is not a cut-down version of Tomas's and
dosn't break or conflict with it. But if many don't think so, I
should dismiss this, of course.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-09-07 03:13:13 Re: Multi-column distinctness.
Previous Message Tom Lane 2015-09-07 02:47:32 Re: Getting total and free disk space from paths in PGDATA