Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-01-14 19:21:45
Message-ID: 78cc09c8-a9aa-9d7b-06f5-c2f9eba08059@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 1/14/19 4:31 PM, Tomas Vondra wrote:
>
> On 1/14/19 12:20 PM, Dean Rasheed wrote:
>> (Removing Adrien from the CC list, because messages to that address
>> keep bouncing)
>>
>> On Sun, 13 Jan 2019 at 00:31, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 1/10/19 6:09 PM, Dean Rasheed wrote:
>>>>
>>>> In the previous discussion around UpdateStatisticsForTypeChange(), the
>>>> consensus appeared to be that we should just unconditionally drop all
>>>> extended statistics when ALTER TABLE changes the type of an included
>>>> column (just as we do for per-column stats), since such a type change
>>>> can rewrite the data in arbitrary ways, so there's no reason to assume
>>>> that the old stats are still valid. I think it makes sense to extract
>>>> that as a separate patch to be committed ahead of these ones, and I'd
>>>> also argue for back-patching it.
>>>
>>> Wasn't the agreement to keep stats that don't include column values
>>> (functional dependencies and ndistinct coefficients), and reset only
>>> more complex stats? That's what happens in master and how it's extended
>>> by the patch for MCV lists and histograms.
>>>
>>
>> Ah OK, I misremembered the exact conclusion reached last time. In that
>> case the logic in UpdateStatisticsForTypeChange() looks wrong:
>>
>> /*
>> * If we can leave the statistics as it is, just do minimal cleanup
>> * and we're done.
>> */
>> if (!attribute_referenced && reset_stats)
>> {
>> ReleaseSysCache(oldtup);
>> return;
>> }
>>
>> That should be "|| !reset_stats", or have more parentheses.
>
> Yeah, it should have been
>
> if (!(attribute_referenced && reset_stats))
>
> i.e. there's a parenthesis missing. Thanks for noticing this. I guess a
> regression test for this would be useful.
>
>> In fact, I think that computing attribute_referenced is unnecessary
>> because the dependency information includes the columns that the
>> stats are for and ATExecAlterColumnType() uses that, so
>> attribute_referenced will always be true.
> Hmmm. I'm pretty sure I came to the conclusion it's in fact necessary,
> but I might be wrong. Will check.
>

Turns out you were right - the attribute_referenced piece was quite
unnecessary. So I've removed it. I've also extended the regression tests
to verify changing type of another column does not reset the stats.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists-20190114.patch.gz application/gzip 40.5 KB
0002-multivariate-histograms-20190114.patch.gz application/gzip 47.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-14 20:48:32 Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
Previous Message Andres Freund 2019-01-14 18:47:48 Re: Reducing header interdependencies around heapam.h et al.