Re: Additional improvements to extended statistics

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-01-14 08:16:50
Message-ID: CAFj8pRAYwW+E9+ujrr+D5RMGtObczHo6Kkx5VRT-aJgTG7Lv8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 14. 1. 2020 v 0:00 odesílatel Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
napsal:

> Hi,
>
> Now that I've committed [1] which allows us to use multiple extended
> statistics per table, I'd like to start a thread discussing a couple of
> additional improvements for extended statistics. I've considered
> starting a separate patch for each, but that would be messy as those
> changes will touch roughly the same places. So I've organized it into a
> single patch series, with the simpler parts at the beginning.
>
> There are three main improvements:
>
> 1) improve estimates of OR clauses
>
> Until now, OR clauses pretty much ignored extended statistics, based on
> the experience that they're less vulnerable to misestimates. But it's a
> bit weird that AND clauses are handled while OR clauses are not, so this
> extends the logic to OR clauses.
>
> Status: I think this is fairly OK.
>
>
> 2) support estimating clauses (Var op Var)
>
> Currently, we only support clauses with a single Var, i.e. clauses like
>
> - Var op Const
> - Var IS [NOT] NULL
> - [NOT] Var
> - ...
>
> and AND/OR clauses built from those simple ones. This patch adds support
> for clauses of the form (Var op Var), of course assuming both Vars come
> from the same relation.
>
> Status: This works, but it feels a bit hackish. Needs more work.
>
>
> 3) support extended statistics on expressions
>
> Currently we only allow simple references to columns in extended stats,
> so we can do
>
> CREATE STATISTICS s ON a, b, c FROM t;
>
> but not
>
> CREATE STATISTICS s ON (a+b), (c + 1) FROM t;
>

+1 for expression's statisctics - it can be great feature.

Pavel

> This patch aims to allow this. At the moment it's a WIP - it does most
> of the catalog changes and stats building, but with some hacks/bugs. And
> it does not even try to use those statistics during estimation.
>
> The first question is how to extend the current pg_statistic_ext catalog
> to support expressions. I've been planning to do it the way we support
> expressions for indexes, i.e. have two catalog fields - one for keys,
> one for expressions.
>
> One difference is that for statistics we don't care about order of the
> keys, so that we don't need to bother with storing 0 keys in place for
> expressions - we can simply assume keys are first, then expressions.
>
> And this is what the patch does now.
>
> I'm however wondering whether to keep this split - why not to just treat
> everything as expressions, and be done with it? A key just represents a
> Var expression, after all. And it would massively simplify a lot of code
> that now has to care about both keys and expressions.
>
> Of course, expressions are a bit more expensive, but I wonder how
> noticeable that would be.
>
> Opinions?
>
>
> ragards
>
> [1] https://commitfest.postgresql.org/26/2320/
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message movead.li@highgo.ca 2020-01-14 09:12:21 Re: Re: Append with naive multiplexing of FDWs
Previous Message Takuma Hoshiai 2020-01-14 06:37:45 Re: Implementing Incremental View Maintenance