Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-01-04 00:12:27
Message-ID: 0c41dd0a-78bc-2bbb-43a1-3f8993514240@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/20/2017 02:44 AM, Mark Dilger wrote:
>
>> On Dec 19, 2017, at 4:31 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> Hi,
>>
>> On 12/19/2017 08:17 PM, Mark Dilger wrote:
>>>
>>> I tested your latest patches on my mac os x laptop and got one test
>>> failure due to the results of 'explain' coming up differently. For the record,
>>> I followed these steps:
>>>
>>> cd postgresql/
>>> git pull
>>> # this got my directory up to 8526bcb2df76d5171b4f4d6dc7a97560a73a5eff with no local changes
>>> patch -p 1 < ../0001-multivariate-MCV-lists.patch
>>> patch -p 1 < ../0002-multivariate-histograms.patch
>>> ./configure --prefix=/Users/mark/master/testinstall --enable-cassert --enable-tap-tests --enable-depend && make -j4 && make check-world
>>>
>>
>> Yeah, those steps sounds about right.
>>
>> Apparently this got broken by ecc27d55f4, although I don't quite
>> understand why - but it works fine before. Can you try if it works fine
>> on 9f4992e2a9 and fails with ecc27d55f4?
>
> It succeeds with 9f4992e2a9. It fails with ecc27d55f4. The failures look
> to be the same as I reported previously.
>

Gah, this turned out to be a silly bug. The ecc27d55f4 commit does:

... and fix dependencies_clauselist_selectivity() so that
estimatedclauses actually is a pure output argument as stated by
its API contract.

which does bring the code in line with the comment stating that
'estimatedclauses' is an output parameter. It wasn't meant to be
strictly output, though, but an input/output one instead (to pass
information about already estimated clauses when applying multiple
statistics).

With only dependencies it did not matter, but with the new MCV and
histogram patches we do this:

Bitmapset *estimatedclauses = NULL;

s1 *= statext_clauselist_selectivity(..., &estimatedclauses);

s1 *= dependencies_clauselist_selectivity(..., &estimatedclauses);

Since ecc27d55f4, the first thing dependencies_clauselist_selectivity
does is resetting estimatedclauses to NULL, throwing away information
about which clauses were estimated by MCV and histogram stats.

Of course, that's something ecc27d55f4 could not predict, but the reset
of estimatedclauses also makes the first loop over clauses rather
confusing, as it also checks the estimatedclauses bitmapset:

listidx = 0;
foreach(l, clauses)
{
Node *clause = (Node *) lfirst(l);

if (!bms_is_member(listidx, *estimatedclauses))
{
...
}

listidx++;
}

Of course, the index can never be part of the bitmapset - we've just
reset it to NULL, and it's the first loop. This does not break anything,
but it's somewhat confusing.

Attached is an updated patch series, where the first patch fixes this by
removing the reset of estimatedclauses (and tweaking the comment).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Keep-information-about-already-estimated-clauses.patch.gz application/gzip 932 bytes
0002-multivariate-MCV-lists.patch.gz application/gzip 31.5 KB
0003-multivariate-histograms.patch.gz application/gzip 60.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-01-04 00:30:34 Re: LIKE foo% optimization easily defeated by OR?
Previous Message Stephen Frost 2018-01-03 23:38:53 January CommitFest is underway!