Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-27 17:42:39
Message-ID: fb20a5d8-032c-0678-10c8-522b695af7dd@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/27/2018 04:58 PM, Dean Rasheed wrote:
> On 27 March 2018 at 01:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> BTW I think there's a bug in handling the fullmatch flag - it should not
>> be passed to AND/OR subclauses the way it is, because then
>>
>> WHERE a=1 OR (a=2 AND b=2)
>>
>> will probably set it to 'true' because of (a=2 AND b=2). Which will
>> short-circuit the statext_clauselist_selectivity, forcing it to ignore
>> the non-MCV part.
>>
>
> I'm not sure that's true. Won't the outer call to
> mcv_update_match_bitmap() overwrite the value of fullmatch returned by
> the nested call, and set fullmatch to false because it has only seen 1
> attribute equality match? I think that's the correct result, but I
> think that's just luck.
>
> The dubious part is the way fullmatch is calculated for OR clauses --
> I think for an OR clause we want to know the attributes matched in
> *every* subclause, rather than in *any* subclause, as we do for AND.
> So I think the only way an OR clause at the top-level should return a
> full match is if every sub-clause was a full match, for example:
>
> WHERE (a=1 AND b=2) OR (a=2 AND b=1)
>

Yes, that seems like the right behavior.

> But then consider this:
>
> WHERE a=1 AND (b=1 OR b=2)
>
> That should also potentially be a full match, but that can only work
> if mcv_update_match_bitmap() returned the set of matching attributes
> (eqmatches), rather than fullmatch, so that it can be merged
> appropriately in the caller. So for an OR clause, it needs to return
> eqmatches containing the list of attributes for which every sub-clause
> matched with equality against the MCV list, and in an outer AND clause
> that can be added to the outer eqmatches list, which is the list of
> attributes for which any sub-clause matched with equality.
>

I think it's useful to see it transformed from:

WHERE a=1 AND (b=1 OR b=2)

to

WHERE (a=1 AND b=1) OR (a=1 AND b=2)

which is the case already handled above. And yes, tracking columns with
an equality seems reasonable.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2018-03-27 17:46:49 Re: [HACKERS] path toward faster partition pruning
Previous Message Tomas Vondra 2018-03-27 17:34:26 Re: [HACKERS] PATCH: multivariate histograms and MCV lists