|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>|
|Cc:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [HACKERS] PATCH: multivariate histograms and MCV lists|
|Views:||Raw Message | Whole Thread | Download mbox|
Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:
1) mostly harmless memset bug in UpdateStatisticsForTypeChange
2) passing the right list (stat_clauses) to mcv_clauselist_selectivity
3) corrections to a couple of outdated comments
4) handling of NOT clauses in MCV lists (and in histograms)
The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:
select * from foo where a=1 and not b=1;
But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.
5) handling of mcv_lowsel in statext_clauselist_selectivity
I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:
if (otherdistinct > 1)
selec /= otherdistinct;
Not sure how to do that when there are not just equality clauses.
BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then
WHERE a=1 OR (a=2 AND b=2)
will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.
But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.
Thanks for the feedback so far!
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||David Rowley||2018-03-27 00:38:42||Re: Parallel Aggregates for string_agg and array_agg|
|Previous Message||Alvaro Herrera||2018-03-27 00:27:40||Re: ppc64le support in 9.3 branch?|