Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-27 00:36:09
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Hi Dean,

Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:

1) mostly harmless memset bug in UpdateStatisticsForTypeChange

2) passing the right list (stat_clauses) to mcv_clauselist_selectivity

3) corrections to a couple of outdated comments

4) handling of NOT clauses in MCV lists (and in histograms)

The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:

select * from foo where a=1 and not b=1;

But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.

5) handling of mcv_lowsel in statext_clauselist_selectivity

I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
you suggested.

I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:

if (otherdistinct > 1)
selec /= otherdistinct;

Not sure how to do that when there are not just equality clauses.

BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then

WHERE a=1 OR (a=2 AND b=2)

will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.

But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.

Thanks for the feedback so far!


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists-20180327.patch.gz application/gzip 33.5 KB
0002-multivariate-histograms-20180327.patch.gz application/gzip 42.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-03-27 00:38:42 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Alvaro Herrera 2018-03-27 00:27:40 Re: ppc64le support in 9.3 branch?