Re: Additional Statistics Hooks

From: Mat Arye <mat(at)timescale(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional Statistics Hooks
Date: 2018-03-13 16:20:27
Message-ID: CADsUR0BC4E_n=msYGzcBa4M_crvpO6qqyx5eQuEveOA3BD+PjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 13, 2018 at 6:31 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 13 March 2018 at 11:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > While it would certainly be nice to have better behavior for that,
> > "add a hook so users who can write C can fix it by hand" doesn't seem
> > like a great solution. On top of the sheer difficulty of writing a
> > hook function, you'd have the problem that no pre-written hook could
> > know about all available functions. I think somehow we'd need a way
> > to add per-function knowledge, perhaps roughly like the protransform
> > feature.
>

I think this isn't either-or. I think a general hook can be useful for
extensions
that want to optimize particular data distributions/workloads using
domain-knowledge about functions common for those workloads.
That way users working with that data can use extensions to optimize
workloads without writing C themselves. I also think a
protransform like feature would add a lot of power to the native planner
but this could take a while
to get into core properly and may not handle all kinds of data
distributions/cases.

An example, of a case a protransform type system would not be able to
optimize is mathematical operator expressions like bucketing integers by
decile --- (integer / 10) * 10.
This is somewhat analogous to date_trunc in the integer space and would
also change the number of resulting distinct rows.

>
> I always imagined that extended statistics could be used for this.
> Right now the estimates are much better when you create an index on
> the function, but there's no real reason to limit the stats that are
> gathered to just plain columns + expression indexes.
>
> I believe I'm not the only person to have considered this. Originally
> extended statistics were named multivariate statistics. I think it was
> Dean and I (maybe others too) that suggested to Tomas to give the
> feature a more generic name so that it can be used for a more general
> purpose later.
>

I also think that the point with extended statistics is a good one and
points to the need for more experimentation/experience which I think
a C hook is better suited for. Putting in a hook will allow extension
writers like us to experiment and figure out the kinds of transform on
statistics that are useful while having
a small footprint on the core. I think designing a protransform-like system
would benefit from more experience with the kinds of transformations that
are useful.
For example, can anything be done if the interval passed to date_trunc is
not constant, or is it not even worth bothering with that case? Maybe
extended
statistics is a better approach, etc.

>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2018-03-13 16:29:25 Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Previous Message David Steele 2018-03-13 16:19:07 Re: PATCH: Configurable file mode mask