Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-25 23:39:55
Message-ID: ba158539-9886-7170-1e32-ab0092cb0ac7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/24/19 8:36 AM, Dean Rasheed wrote:
> On Sun, 24 Mar 2019 at 00:17, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>
>> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 3/21/19 4:05 PM, David Rowley wrote:
>>
>>>> 29. Looking at the tests I see you're testing that you get bad
>>>> estimates without extended stats. That does not really seem like
>>>> something that should be done in tests that are meant for extended
>>>> statistics.
>>>>
>>>
>>> True, it might be a bit unnecessary. Initially the tests were meant to
>>> show old/new estimates for development purposes, but it might not be
>>> appropriate for regression tests. I don't think it's a big issue, it's
>>> not like it'd slow down the tests significantly. Opinions?
>>
>> My thoughts were that if someone did something to improve non-MV
>> stats, then is it right for these tests to fail? What should the
>> developer do in the case? update the expected result? remove the test?
>> It's not so clear.
>>
>
> I think the tests are fine as they are. Don't think of these as "good"
> and "bad" estimates. They should both be "good" estimates, but under
> different assumptions -- one assuming no correlation between columns,
> and one taking into account the relationship between the columns. If
> someone does do something to "improve" the non-MV stats, then the
> former tests ought to tell us whether it really was an improvement. If
> so, then the test result can be updated and perhaps whatever was done
> ought to be factored into the MV-stats' calculation of base
> frequencies. If not, the test is providing valuable feedback that
> perhaps it wasn't such a good improvement after all.
>

Yeah, I agree. I'm sure there are ways to further simplify (or otherwise
improve) the tests, but I think those tests are useful to demonstrate
what the "baseline" estimates are.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-03-25 23:58:48 Re: Usage of epoch in txid_current
Previous Message Tomas Vondra 2019-03-25 23:36:25 Re: [HACKERS] PATCH: multivariate histograms and MCV lists