Quick Links

Re: Additional improvements to extended statistics

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Additional improvements to extended statistics
Date:	2020-03-18 19:31:28
Message-ID:	20200318193128.53soruotqdqnlaoy@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 15, 2020 at 12:37:37PM +0000, Dean Rasheed wrote:
>On Sun, 15 Mar 2020 at 00:08, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote:
>> >
>> >Attached is a patch series rebased on top of the current master, after
>> >committing the ScalarArrayOpExpr enhancements. I've updated the OR patch
>> >to get rid of the code duplication, and barring objections I'll get it
>> >committed shortly together with the two parts improving test coverage.
>>
>> I've pushed the two patches improving test coverage for functional
>> dependencies and MCV lists, which seems mostly non-controversial. I'll
>> wait a bit more with the two patches actually changing behavior (rebased
>> version attached, to keep cputube happy).
>>
>
>Patch 0001 looks to be mostly ready. Just a couple of final comments:
>
>+ if (is_or)
>+ simple_sel = clauselist_selectivity_simple_or(root,
>stat_clauses, varRelid,
>+ jointype,
>sjinfo, NULL, 1.0);
>+ else
>
>Surely that should be passing 0.0 as the final argument, otherwise it
>will always just return simple_sel = 1.0.
>
>
>+ *
>+ * XXX We can't multiply with current value, because for OR clauses
>+ * we start with 0.0, so we simply assign to s1 directly.
>+ */
>+ s = statext_clauselist_selectivity(root, clauses, varRelid,
>+ jointype, sjinfo, rel,
>+ &estimatedclauses, true);
>
>That final part of the comment is no longer relevant (variable s1 no
>longer exists). Probably it could now just be deleted, since I think
>there are sufficient comments elsewhere to explain what's going on.
>
>Otherwise it looks good, and I think this will lead to some very
>worthwhile improvements.
>

Attached is a rebased patch series, addressing both those issues.

I've been wondering why none of the regression tests failed because of
the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
make the tests stable, all the MCV lists we use are "perfect" i.e. it
represents 100% of the data. But this selectivity is used to compute
selectivity only for the part not represented by the MCV list, i.e. it's
not really used. I suppose we could add a test that would use larger
MCV item, but I'm afraid that'd be inherently unstable :-(

Another thing I was thinking about is the changes to the API. We need to
pass information whether the clauses are connected by AND or OR to a
number of places, and 0001 does that in two ways. For some functions it
adds a new parameter (called is_or), and for other functiosn it creates
a new copy of a function. So for example

- statext_mcv_clauselist_selectivity
- statext_clauselist_selectivity

got the new flag, while e.g. clauselist_selectivity gets a new "copy"
sibling called clauselist_selectivity_or.

There were two reasons for not using flag. First, clauselist_selectivity
and similar functions have to do very different stuff for these two
cases, so it'd be just one huge if/else block. Second, minimizing
breakage of third-party code - pretty much all the extensions I've seen
only work with AND clauses, and call clauselist_selectivity. Adding a
flag would break that code. (Also, there's a bit of laziness, because
this was the simplest thing to do during development.)

But I wonder if that's sufficient reason - maybe we should just add the
flag in all cases. It might break some code, but the fix is trivial (add
a false there).

Opinions?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Improve-estimation-of-OR-clauses-using-exte-20200318.patch	text/plain	20.6 KB
0002-Support-clauses-of-the-form-Var-op-Var-20200318.patch	text/plain	17.2 KB

In response to

Re: Additional improvements to extended statistics at 2020-03-15 12:37:37 from Dean Rasheed

Responses

Re: Additional improvements to extended statistics at 2020-03-19 19:08:07 from Dean Rasheed

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2020-03-18 19:50:38	Re: Minimal logical decoding on standbys
Previous Message	Julien Rouhaud	2020-03-18 19:24:00	Re: WAL usage calculation patch