Re: planner support functions: handle GROUP BY estimates ?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: planner support functions: handle GROUP BY estimates ?
Date: 2019-12-26 21:32:50
Message-ID: 20191226213250.GB12890@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 22, 2019 at 06:16:48PM -0600, Justin Pryzby wrote:
> On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:
> > Tom implemented "Planner support functions":
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a391ff3c3d418e404a2c6e4ff0865a107752827b
> > https://www.postgresql.org/docs/12/xfunc-optimization.html
> >
> > I wondered whether there was any consideration to extend that to allow
> > providing improved estimates of "group by". That currently requires manually
> > by creating an expression index, if the function is IMMUTABLE (which is not
> > true for eg. date_trunc of timestamptz).
>
> I didn't hear back so tried implementing this for date_trunc(). Currently, the

> I currently assume that the input data has 1 second granularity:
...
> If the input timestamps have (say) hourly granularity, rowcount will be
> *underestimated* by 3600x, which is worse than the behavior in master of
> overestimating by (for "day") 24x.
>
> I'm trying to think of ways to address that:

In the attached, I handled that by using histogram and variable's initial
ndistinct estimate, giving good estimates even for intermediate granularities
of input timestamps.

|postgres=# DROP TABLE IF EXISTS t; CREATE TABLE t(i) AS SELECT a FROM generate_series(now(), now()+'11 day'::interval, '15 minutes')a,generate_series(1,9)b; ANALYZE t;
|
|postgres=# explain analyze SELECT date_trunc('hour',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..188.99 rows=264 width=8) (actual time=42.110..42.317 rows=265 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('minute',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..198.91 rows=1057 width=8) (actual time=41.685..42.264 rows=1057 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('day',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..185.83 rows=11 width=8) (actual time=46.672..46.681 rows=12 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('second',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..198.91 rows=1057 width=8) (actual time=41.816..42.435 rows=1057 loops=1)

Attachment Content-Type Size
v2-0001-Planner-support-functions-for-GROUP-BY-f.patch text/x-diff 11.2 KB
v2-0002-Pass-ndistinct-and-minmax-to-allow-good-estimates.patch text/x-diff 10.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-12-26 21:58:37 Re: Libpq support to connect to standby server as priority
Previous Message Tom Lane 2019-12-26 20:29:56 Re: Calling PLpgSQL function with composite type fails with an error: "ERROR: could not open relation with OID ..."