Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-28 00:34:17
Message-ID: 681c6a78-2aef-769e-bd0f-290ac650e71d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 03/27/2018 07:34 PM, Tomas Vondra wrote:
> On 03/27/2018 07:03 PM, Dean Rasheed wrote:
>> On 27 March 2018 at 14:58, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>> On 27 March 2018 at 01:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> 4) handling of NOT clauses in MCV lists (and in histograms)
>>>>
>>>> The query you posted does not fail anymore...
>>>>
>>> Ah, it turns out the previous query wasn't actually failing for the
>>> reason I thought it was -- it was failing because it had a
>>> ScalarArrayOpExpr that was being passed to
>>> mcv_clauselist_selectivity() because of the wrong list being passed to
>>> it. I could see from the code that a NOT clause would have tripped it
>>> up, but most NOT clauses actually get rewritten by negate_clause() so
>>> they end up not being NOT clauses.
>>>
>>
>> Thinking about that some, I think that the only NOT clauses this needs
>> to actually worry about are NOTs of boolean Vars. Anything else that
>> this code supports will have been transformed into something other
>> than a NOT before reaching this point. Thus it might be much simpler
>> to handle that as a special case in statext_is_compatible_clause() and
>> mcv_update_match_bitmap(), rather than trying to support general NOT
>> clauses, and going through a recursive call to
>> mcv_update_match_bitmap(), and then having to merge bitmaps. NOT of a
>> boolean Var could then be treated just like var=false, setting the
>> appropriate attribute match entry if it's found in the MCV list. This
>> would allow clauses like (a=1 and NOT b) to be supported, which I
>> don't think currently works, because fullmatch won't get set.
>>
>
> Yes, I came to the same conclusion ;-) I'll send an updated patch later
> today.
>

Attached is a patch fixing this. In the end I've decided to keep both
branches - one handling boolean Vars and one for NOT clauses. I think
you're right we can only see (NOT var) cases, but I'm not sure about that.

For example, what if an operator does not have a negator? Then we can't
transform NOT (a AND b) => (NOT a OR NOT b), I guess. So I kept this for
now, and we can remove this later.

I've added scalarneqsel, scalarlesel and scalargesel so that we
recognize those cases correctly. This fixes surprising behavior where
"obviously compatible" clauses like (a=1 AND b<1) became incompatible
when NOT was used, because

NOT (a=1 AND b<1) = (a!=1 OR b>=1)

In my defense, the scalarlesel/scalargesel were introduced fairly
recently, I think.

I've also realized that the "fullmatch" flag is somewhat confused,
because some places interpreted it as "there is equality on each
attribute" but in fact it also required an actual MCV match. So when the
value was rare (not in MCV), it was always false.

There's a WIP part 0002, which should eventually be merged into 0001. It
should properly detect the case when each column has an equality, simply
by counting the top-level equality clauses (I'm not sure about the more
complex cases yet).

Another improvement done in this part is the ndistinct estimate. It
simply extracts Vars (from the top--level equality clauses, because it's
directly related to the fullmatch semantics), and uses that to compute
average frequency of non-MCV items. The function is just a simplified
version of the estimate_num_groups(), for this single-relation case.

BTW an unsquashed tag with those fixes is here:

https://github.com/tvondra/postgres/tree/mvstats-20180328

it may be more convenient to quickly check the differences than
comparing the patches.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists-20180328.patch.gz application/gzip 33.7 KB
0002-WIP-detecting-full-equality-matches-and-ndi-20180328.patch.gz application/gzip 4.1 KB
0003-multivariate-histograms-20180328.patch.gz application/gzip 43.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-28 00:43:29 Re: PL/pgSQL nested CALL with transactions
Previous Message Haribabu Kommi 2018-03-28 00:28:32 Re: Enhance pg_stat_wal_receiver view to display connected host