|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>|
|Cc:||PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Additional improvements to extended statistics|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Sun, Mar 15, 2020 at 12:37:37PM +0000, Dean Rasheed wrote:
>On Sun, 15 Mar 2020 at 00:08, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote:
>> >Attached is a patch series rebased on top of the current master, after
>> >committing the ScalarArrayOpExpr enhancements. I've updated the OR patch
>> >to get rid of the code duplication, and barring objections I'll get it
>> >committed shortly together with the two parts improving test coverage.
>> I've pushed the two patches improving test coverage for functional
>> dependencies and MCV lists, which seems mostly non-controversial. I'll
>> wait a bit more with the two patches actually changing behavior (rebased
>> version attached, to keep cputube happy).
>Patch 0001 looks to be mostly ready. Just a couple of final comments:
>+ if (is_or)
>+ simple_sel = clauselist_selectivity_simple_or(root,
>sjinfo, NULL, 1.0);
>Surely that should be passing 0.0 as the final argument, otherwise it
>will always just return simple_sel = 1.0.
>+ * XXX We can't multiply with current value, because for OR clauses
>+ * we start with 0.0, so we simply assign to s1 directly.
>+ s = statext_clauselist_selectivity(root, clauses, varRelid,
>+ jointype, sjinfo, rel,
>+ &estimatedclauses, true);
>That final part of the comment is no longer relevant (variable s1 no
>longer exists). Probably it could now just be deleted, since I think
>there are sufficient comments elsewhere to explain what's going on.
>Otherwise it looks good, and I think this will lead to some very
Attached is a rebased patch series, addressing both those issues.
I've been wondering why none of the regression tests failed because of
the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
make the tests stable, all the MCV lists we use are "perfect" i.e. it
represents 100% of the data. But this selectivity is used to compute
selectivity only for the part not represented by the MCV list, i.e. it's
not really used. I suppose we could add a test that would use larger
MCV item, but I'm afraid that'd be inherently unstable :-(
Another thing I was thinking about is the changes to the API. We need to
pass information whether the clauses are connected by AND or OR to a
number of places, and 0001 does that in two ways. For some functions it
adds a new parameter (called is_or), and for other functiosn it creates
a new copy of a function. So for example
got the new flag, while e.g. clauselist_selectivity gets a new "copy"
sibling called clauselist_selectivity_or.
There were two reasons for not using flag. First, clauselist_selectivity
and similar functions have to do very different stuff for these two
cases, so it'd be just one huge if/else block. Second, minimizing
breakage of third-party code - pretty much all the extensions I've seen
only work with AND clauses, and call clauselist_selectivity. Adding a
flag would break that code. (Also, there's a bit of laziness, because
this was the simplest thing to do during development.)
But I wonder if that's sufficient reason - maybe we should just add the
flag in all cases. It might break some code, but the fix is trivial (add
a false there).
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Alvaro Herrera||2020-03-18 19:50:38||Re: Minimal logical decoding on standbys|
|Previous Message||Julien Rouhaud||2020-03-18 19:24:00||Re: WAL usage calculation patch|