Re: PoC/WIP: Extended statistics on expressions

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2020-12-07 16:02:00
Message-ID: CAEZATCWmQOFUqo4GXPhGsYGrbhYq0J4ZMz8B-LZXptu7S2y2GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 7 Dec 2020 at 14:15, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 12/7/20 10:56 AM, Dean Rasheed wrote:
> > it might actually be
> > neater to have separate documented syntaxes for single- and
> > multi-column statistics:
> >
> > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > ON (expression)
> > FROM table_name
> >
> > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > [ ( statistics_kind [, ... ] ) ]
> > ON { column_name | (expression) } , { column_name | (expression) } [, ...]
> > FROM table_name
>
> I think it makes sense in general. I see two issues with this approach,
> though:
>
> * By adding expression/standard stats for individual statistics, it
> makes the list of statistics longer - I wonder if this might have
> measurable impact on lookups in this list.
>
> * I'm not sure it's a good idea that the second syntax would always
> build the per-expression stats. Firstly, it seems a bit strange that it
> behaves differently than the other kinds. Secondly, I wonder if there
> are cases where it'd be desirable to explicitly disable building these
> per-expression stats. For example, what if we have multiple extended
> statistics objects, overlapping on a couple expressions. It seems
> pointless to build the stats for all of them.
>

Hmm, I'm not sure it would really be a good idea to build MCV stats on
expressions without also building the standard stats for those
expressions, otherwise the assumptions that
mcv_combine_selectivities() makes about simple_sel and mcv_basesel
wouldn't really hold. But then, if multiple MCV stats shared the same
expression, it would be quite wasteful to build standard stats on the
expression more than once.

It feels like it should build a single extended stats object for each
unique expression, with appropriate dependencies for any MCV stats
that used those expressions, but I'm not sure how complex that would
be. Dropping the last MCV stat object using a standard expression stat
object might reasonably drop the expression stats ... except if they
were explicitly created by the user, independently of any MCV stats.
That could get quite messy.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2020-12-07 16:15:56 Re: Additional improvements to extended statistics
Previous Message Alvaro Herrera 2020-12-07 15:23:49 Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently