Quick Links

Re: planner support functions: handle GROUP BY estimates ?

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: planner support functions: handle GROUP BY estimates ?
Date:	2019-12-26 21:32:50
Message-ID:	20191226213250.GB12890@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Dec 22, 2019 at 06:16:48PM -0600, Justin Pryzby wrote:
> On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:
> > Tom implemented "Planner support functions":
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a391ff3c3d418e404a2c6e4ff0865a107752827b
> > https://www.postgresql.org/docs/12/xfunc-optimization.html
> >
> > I wondered whether there was any consideration to extend that to allow
> > providing improved estimates of "group by". That currently requires manually
> > by creating an expression index, if the function is IMMUTABLE (which is not
> > true for eg. date_trunc of timestamptz).
>
> I didn't hear back so tried implementing this for date_trunc(). Currently, the

> I currently assume that the input data has 1 second granularity:
...
> If the input timestamps have (say) hourly granularity, rowcount will be
> *underestimated* by 3600x, which is worse than the behavior in master of
> overestimating by (for "day") 24x.
>
> I'm trying to think of ways to address that:

In the attached, I handled that by using histogram and variable's initial
ndistinct estimate, giving good estimates even for intermediate granularities
of input timestamps.

|postgres=# DROP TABLE IF EXISTS t; CREATE TABLE t(i) AS SELECT a FROM generate_series(now(), now()+'11 day'::interval, '15 minutes')a,generate_series(1,9)b; ANALYZE t;
|
|postgres=# explain analyze SELECT date_trunc('hour',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..188.99 rows=264 width=8) (actual time=42.110..42.317 rows=265 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('minute',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..198.91 rows=1057 width=8) (actual time=41.685..42.264 rows=1057 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('day',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..185.83 rows=11 width=8) (actual time=46.672..46.681 rows=12 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('second',i) i FROM t GROUP BY 1;
| HashAggregate (cost=185.69..198.91 rows=1057 width=8) (actual time=41.816..42.435 rows=1057 loops=1)

Attachment	Content-Type	Size
v2-0001-Planner-support-functions-for-GROUP-BY-f.patch	text/x-diff	11.2 KB
v2-0002-Pass-ndistinct-and-minmax-to-allow-good-estimates.patch	text/x-diff	10.0 KB

In response to

Re: planner support functions: handle GROUP BY estimates ? at 2019-12-23 00:16:48 from Justin Pryzby

Responses

Re: planner support functions: handle GROUP BY estimates ? at 2020-01-14 20:12:21 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2019-12-26 21:58:37	Re: Libpq support to connect to standby server as priority
Previous Message	Tom Lane	2019-12-26 20:29:56	Re: Calling PLpgSQL function with composite type fails with an error: "ERROR: could not open relation with OID ..."