Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-27 14:58:09
Message-ID: CAEZATCVKFCLi4NcXBU9ETvOO4WouVr8nfaUkeW6QUg_87YXBUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 March 2018 at 01:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> BTW I think there's a bug in handling the fullmatch flag - it should not
> be passed to AND/OR subclauses the way it is, because then
>
> WHERE a=1 OR (a=2 AND b=2)
>
> will probably set it to 'true' because of (a=2 AND b=2). Which will
> short-circuit the statext_clauselist_selectivity, forcing it to ignore
> the non-MCV part.
>

I'm not sure that's true. Won't the outer call to
mcv_update_match_bitmap() overwrite the value of fullmatch returned by
the nested call, and set fullmatch to false because it has only seen 1
attribute equality match? I think that's the correct result, but I
think that's just luck.

The dubious part is the way fullmatch is calculated for OR clauses --
I think for an OR clause we want to know the attributes matched in
*every* subclause, rather than in *any* subclause, as we do for AND.
So I think the only way an OR clause at the top-level should return a
full match is if every sub-clause was a full match, for example:

WHERE (a=1 AND b=2) OR (a=2 AND b=1)

But then consider this:

WHERE a=1 AND (b=1 OR b=2)

That should also potentially be a full match, but that can only work
if mcv_update_match_bitmap() returned the set of matching attributes
(eqmatches), rather than fullmatch, so that it can be merged
appropriately in the caller. So for an OR clause, it needs to return
eqmatches containing the list of attributes for which every sub-clause
matched with equality against the MCV list, and in an outer AND clause
that can be added to the outer eqmatches list, which is the list of
attributes for which any sub-clause matched with equality.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-27 14:58:14 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Alvaro Herrera 2018-03-27 14:54:53 Re: Foreign keys and partitioned tables