From: | Mat Arye <mat(at)timescale(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Additional Statistics Hooks |
Date: | 2018-03-13 16:20:27 |
Message-ID: | CADsUR0BC4E_n=msYGzcBa4M_crvpO6qqyx5eQuEveOA3BD+PjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 13, 2018 at 6:31 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
> On 13 March 2018 at 11:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > While it would certainly be nice to have better behavior for that,
> > "add a hook so users who can write C can fix it by hand" doesn't seem
> > like a great solution. On top of the sheer difficulty of writing a
> > hook function, you'd have the problem that no pre-written hook could
> > know about all available functions. I think somehow we'd need a way
> > to add per-function knowledge, perhaps roughly like the protransform
> > feature.
>
I think this isn't either-or. I think a general hook can be useful for
extensions
that want to optimize particular data distributions/workloads using
domain-knowledge about functions common for those workloads.
That way users working with that data can use extensions to optimize
workloads without writing C themselves. I also think a
protransform like feature would add a lot of power to the native planner
but this could take a while
to get into core properly and may not handle all kinds of data
distributions/cases.
An example, of a case a protransform type system would not be able to
optimize is mathematical operator expressions like bucketing integers by
decile --- (integer / 10) * 10.
This is somewhat analogous to date_trunc in the integer space and would
also change the number of resulting distinct rows.
>
> I always imagined that extended statistics could be used for this.
> Right now the estimates are much better when you create an index on
> the function, but there's no real reason to limit the stats that are
> gathered to just plain columns + expression indexes.
>
> I believe I'm not the only person to have considered this. Originally
> extended statistics were named multivariate statistics. I think it was
> Dean and I (maybe others too) that suggested to Tomas to give the
> feature a more generic name so that it can be used for a more general
> purpose later.
>
I also think that the point with extended statistics is a good one and
points to the need for more experimentation/experience which I think
a C hook is better suited for. Putting in a hook will allow extension
writers like us to experiment and figure out the kinds of transform on
statistics that are useful while having
a small footprint on the core. I think designing a protransform-like system
would benefit from more experience with the kinds of transformations that
are useful.
For example, can anything be done if the interval passed to date_trunc is
not constant, or is it not even worth bothering with that case? Maybe
extended
statistics is a better approach, etc.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Zakirov | 2018-03-13 16:29:25 | Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) |
Previous Message | David Steele | 2018-03-13 16:19:07 | Re: PATCH: Configurable file mode mask |