Re: Additional improvements to extended statistics

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-12-07 16:15:56
Message-ID: CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2 Dec 2020 at 15:51, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> The sort of queries I had in mind were things like this:
>
> WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2)
>
> However, the new code doesn't apply the extended stats directly using
> clauselist_selectivity_or() for this kind of query because there are
> no RestrictInfos for the nested AND clauses, so
> find_single_rel_for_clauses() (and similarly
> statext_is_compatible_clause()) regards those clauses as not
> compatible with extended stats. So what ends up happening is that
> extended stats are used only when we descend down to the two AND
> clauses, and their results are combined using the original "s1 + s2 -
> s1 * s2" formula. That actually works OK in this case, because there
> is no overlap between the two AND clauses, but it wouldn't work so
> well if there was.
>
> I'm pretty sure that can be fixed by teaching
> find_single_rel_for_clauses() and statext_is_compatible_clause() to
> handle BoolExpr clauses, looking for RestrictInfos underneath them,
> but I think that should be left for a follow-in patch.

Attached is a patch doing that, which improves a couple of the
estimates for queries with AND clauses underneath OR clauses, as
expected.

This also revealed a minor bug in the way that the estimates for
multiple statistics objects were combined while processing an OR
clause -- the estimates for the overlaps between clauses only apply
for the current statistics object, so we really have to combine the
estimates for each set of clauses for each statistics object as if
they were independent of one another.

0001 fixes the multiple-extended-stats issue for OR clauses, and 0002
improves the estimates for sub-AND clauses underneath OR clauses.

These are both quite small patches, that hopefully won't interfere
with any of the other extended stats patches.

Regards,
Dean

Attachment Content-Type Size
0001-Improve-estimation-of-OR-clauses-using-multiple-exte.patch application/octet-stream 3.3 KB
0002-Improve-estimation-of-ANDs-under-ORs-using-extended-.patch application/octet-stream 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-12-07 16:16:00 Re: Huge memory consumption on partitioned table with FKs
Previous Message Dean Rasheed 2020-12-07 16:02:00 Re: PoC/WIP: Extended statistics on expressions