Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-26 19:17:09
Message-ID: 66ba9c45-3b07-e5b6-fd11-15e75f6ee8e1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/26/2018 09:01 PM, Dean Rasheed wrote:
> On 18 March 2018 at 23:57, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Attached is an updated version of the patch series, addressing issues
>> pointed out by Alvaro.
>
> I've just been reading the new code in
> statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and
> I'm having a hard time convincing myself that it's correct.
>
> This code in statext_clauselist_selectivity() looks a bit odd:
>
> /*
> * Evaluate the MCV selectivity. See if we got a full match and the
> * minimal selectivity.
> */
> if (stat->kind == STATS_EXT_MCV)
> s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
> jointype, sjinfo, rel,
> &fullmatch, &mcv_lowsel);
>
> /*
> * If we got a full equality match on the MCV list, we're done (and the
> * estimate is likely pretty good).
> */
> if (fullmatch && (s1 > 0.0))
> return s1;
>
> /*
> * If it's a full match (equalities on all columns) but we haven't found
> * it in the MCV, then we limit the selectivity by frequency of the last
> * MCV item. Otherwise reset it to 1.0.
> */
> if (!fullmatch)
> mcv_lowsel = 1.0;
>
> return Min(s1, mcv_lowsel);
>
> So if fullmatch is true and s1 is greater than 0, it will return s1.
> If fullmatch is true and s1 equals 0, it will return Min(s1,
> mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel
> will be set to 1 and it will return Min(s1, mcv_lowsel) which will
> also be s1. So it always just returns s1, no? Maybe there's no point
> in computing fullmatch.
>

Hmmm, I think you're right. It probably got broken in the last rebase,
when I moved a bunch of code from the histogram part to the MCV one.
I'll take a look.

> Also, wouldn't mcv_lowsel potentially be a significant overestimate
> anyway? Perhaps 1 minus the sum of the MCV frequencies might be
> closer, but even that ought to take into account the number of
> distinct values remaining, although that information may not always be
> available.
>

That is definitely true. 1 minus the sum of the MCV frequencies, and I
suppose we might even improve that if we had some ndistinct estimate on
those columns to compute an average.

> Also, just above that, in statext_clauselist_selectivity(), it
> computes the list stat_clauses, then doesn't appear to use it
> anywhere. I think that would have been the appropriate thing to pass
> to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
> into mcv_clauselist_selectivity() will cause it to fail to find any
> matches and then underestimate.
>

Will check.

> I've also come across a few incorrect/out-of-date comments:
>
> /*
> * mcv_clauselist_selectivity
> * Return the estimated selectivity of the given clauses using MCV list
> * statistics, or 1.0 if no useful MCV list statistic exists.
> */
>
> -- I can't see any code path that returns 1.0 if there are no MCV
> stats. The last part of that comment is probably more appropriate to
> statext_clauselist_selectivity()
>
>
> /*
> * mcv_update_match_bitmap
> * [snip]
> * The function returns the number of items currently marked as 'match', and
> * ...
>
> -- it doesn't seem to return the number of items marked as 'match'.
>
> Then inside that function, this comment is wrong (copied from the
> preceding comment):
>
> /* AND clauses assume nothing matches, initially */
> memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
> mcvlist->nitems);
>
> Still reading...
>
> Regards,
> Dean
>

Yeah, sorry about that - I forgot to fix those comments after removing
the match counting to simplify the patches.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-26 19:26:00 Re: Why does load_external_function() return PGFunction?
Previous Message Simon Riggs 2018-03-26 19:17:03 Re: [HACKERS] MERGE SQL Statement for PG11