Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-27 00:36:09
Message-ID: 49348328-1a02-7e45-e50f-e1d4d0243c5c@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Dean,

Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:

1) mostly harmless memset bug in UpdateStatisticsForTypeChange

2) passing the right list (stat_clauses) to mcv_clauselist_selectivity

3) corrections to a couple of outdated comments

4) handling of NOT clauses in MCV lists (and in histograms)

The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:

select * from foo where a=1 and not b=1;

But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.

5) handling of mcv_lowsel in statext_clauselist_selectivity

I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
you suggested.

I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:

if (otherdistinct > 1)
selec /= otherdistinct;

Not sure how to do that when there are not just equality clauses.

BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then

WHERE a=1 OR (a=2 AND b=2)

will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.

But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.

Thanks for the feedback so far!

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists-20180327.patch.gz application/gzip 33.5 KB
0002-multivariate-histograms-20180327.patch.gz application/gzip 42.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-03-27 00:38:42 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Alvaro Herrera 2018-03-27 00:27:40 Re: ppc64le support in 9.3 branch?