Re: Additional improvements to extended statistics

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-03-23 08:21:42
Message-ID: CAEZATCW3dy1g=tEmP3oW-4FH=Lpp516xfXfQyPXiEYm4v3hJfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 21 Mar 2020 at 21:59, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Ah, right. Yeah, I think that should work. I thought there would be some
> volatility due to groups randomly not making it into the MCV list, but
> you're right it's possible to construct the data in a way to make it
> perfectly deterministic. So that's what I've done in the attached patch.
>

Looking through those new tests, another issue occurred to me -- under
some circumstances this patch can lead to extended stats being applied
twice to the same clause, which is not great, because it involves
quite a lot of extra work, and also because it can lead to
overestimates because of the way that MCV stats are applied as a delta
correction to simple_sel.

The way this comes about is as follows -- if we start with an OR
clause, that will be passed as a single-item implicitly AND'ed list to
clauselist_selectivity(), and from there to
statext_clauselist_selectivity() and then to
statext_mcv_clauselist_selectivity(). This will call
clauselist_selectivity_simple() to get simple_sel, before calling
mcv_clauselist_selectivity(), which will recursively compute all the
MCV corrections. However, the call to clauselist_selectivity_simple()
will call clause_selectivity() for each clause (just a single OR
clause in this example), which will now call
clauselist_selectivity_or(), which will go back into
statext_clauselist_selectivity() with "is_or = true", which will apply
the MCV corrections to the same set of clauses that the outer call was
about to process.

I'm not sure what's the best way to resolve that. Perhaps
statext_mcv_clauselist_selectivity() / statext_is_compatible_clause()
should ignore OR clauses from an AND-list, on the basis that they will
get processed recursively later. Or perhaps estimatedclauses can
somehow be used to prevent this, though I'm not sure exactly how that
would work.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-03-23 08:32:08 Re: color by default
Previous Message Kyotaro Horiguchi 2020-03-23 08:20:27 Re: [HACKERS] WAL logging problem in 9.4.3?