Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-24 07:36:31
Message-ID: CAEZATCUHsbMLRxMf67hi05nRu=DmwNm1VW=NeXn4JVw2T=SWog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 24 Mar 2019 at 00:17, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > On 3/21/19 4:05 PM, David Rowley wrote:
>
> > > 29. Looking at the tests I see you're testing that you get bad
> > > estimates without extended stats. That does not really seem like
> > > something that should be done in tests that are meant for extended
> > > statistics.
> > >
> >
> > True, it might be a bit unnecessary. Initially the tests were meant to
> > show old/new estimates for development purposes, but it might not be
> > appropriate for regression tests. I don't think it's a big issue, it's
> > not like it'd slow down the tests significantly. Opinions?
>
> My thoughts were that if someone did something to improve non-MV
> stats, then is it right for these tests to fail? What should the
> developer do in the case? update the expected result? remove the test?
> It's not so clear.
>

I think the tests are fine as they are. Don't think of these as "good"
and "bad" estimates. They should both be "good" estimates, but under
different assumptions -- one assuming no correlation between columns,
and one taking into account the relationship between the columns. If
someone does do something to "improve" the non-MV stats, then the
former tests ought to tell us whether it really was an improvement. If
so, then the test result can be updated and perhaps whatever was done
ought to be factored into the MV-stats' calculation of base
frequencies. If not, the test is providing valuable feedback that
perhaps it wasn't such a good improvement after all.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-03-24 09:12:24 Re: [GSoC] application ideas
Previous Message Pavel Stehule 2019-03-24 05:57:10 Re: [HACKERS] proposal: schema variables