Re: Additional improvements to extended statistics

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-12-08 12:46:57
Message-ID: 4ba455c6-a0fa-cae7-7bc3-4aa5b6cd11d4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/7/20 5:15 PM, Dean Rasheed wrote:
> On Wed, 2 Dec 2020 at 15:51, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>
>> The sort of queries I had in mind were things like this:
>>
>> WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2)
>>
>> However, the new code doesn't apply the extended stats directly using
>> clauselist_selectivity_or() for this kind of query because there are
>> no RestrictInfos for the nested AND clauses, so
>> find_single_rel_for_clauses() (and similarly
>> statext_is_compatible_clause()) regards those clauses as not
>> compatible with extended stats. So what ends up happening is that
>> extended stats are used only when we descend down to the two AND
>> clauses, and their results are combined using the original "s1 + s2 -
>> s1 * s2" formula. That actually works OK in this case, because there
>> is no overlap between the two AND clauses, but it wouldn't work so
>> well if there was.
>>
>> I'm pretty sure that can be fixed by teaching
>> find_single_rel_for_clauses() and statext_is_compatible_clause() to
>> handle BoolExpr clauses, looking for RestrictInfos underneath them,
>> but I think that should be left for a follow-in patch.
>
> Attached is a patch doing that, which improves a couple of the
> estimates for queries with AND clauses underneath OR clauses, as
> expected.
>
> This also revealed a minor bug in the way that the estimates for
> multiple statistics objects were combined while processing an OR
> clause -- the estimates for the overlaps between clauses only apply
> for the current statistics object, so we really have to combine the
> estimates for each set of clauses for each statistics object as if
> they were independent of one another.
>
> 0001 fixes the multiple-extended-stats issue for OR clauses, and 0002
> improves the estimates for sub-AND clauses underneath OR clauses.
>

Cool! Thanks for taking time to investigate and fixing those. Both
patches seem fine to me.

> These are both quite small patches, that hopefully won't interfere
> with any of the other extended stats patches.
>

I haven't tried, but it should not interfere with it too much.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-08 12:47:44 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Tomas Vondra 2020-12-08 12:44:10 Re: PoC/WIP: Extended statistics on expressions