Re: Planner estimates and cast operations ,...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner estimates and cast operations ,...
Date: 2006-09-04 14:57:20
Message-ID: 23903.1157381840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hans-Juergen Schoenig <postgres(at)cybertec(dot)at> writes:
> consider the following:

> SELECT some_timestamp::date FROM very_large_table GROUP BY
> some_timestamp::date

> my very_large_table is around 1billion entries.
> the problem is: the planner has a problem here as it is taking the
> (correct) estimates for timestamp. this avoids a HashAggregate
> because the dataset seems to large for work_mem.
> what the planner cannot know is that the number of days is quite
> limited (in my case around 1000 different values).
> i wonder how to teach the planner to take the cast into consideration.

Create an index on that expression.

regression=# create table foo(x) as select x * '864 sec'::interval + now()::timestamp from generate_series(1,10000) x;
SELECT
regression=# analyze foo;
ANALYZE
regression=# explain select x::date from foo group by x::date;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=205.00..330.00 rows=10000 width=8)
-> Seq Scan on foo (cost=0.00..180.00 rows=10000 width=8)
(2 rows)

regression=# create index fooi on foo((x::date));
CREATE INDEX
regression=# analyze foo;
ANALYZE
regression=# explain select x::date from foo group by x::date;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=205.00..206.26 rows=101 width=8)
-> Seq Scan on foo (cost=0.00..180.00 rows=10000 width=8)
(2 rows)

regression=#

I had to cheat a little bit here: I tried to do this example with a
timestamptz column, and the index creation failed because timestamptz to
date isn't immutable (it depends on TimeZone). If yours is too, you
could perhaps do something involving AT TIME ZONE to generate an
immutable conversion to date.

It would perhaps make sense to provide a way to cue ANALYZE to compute
stats on expressions that aren't actually being indexed, but I see no
good reason to limit our attention to cast expressions.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-09-04 15:07:46 pgsql: sslinfo contrib module - information about current SSL
Previous Message Michael Meskes 2006-09-04 14:49:37 Re: [PATCHES] possible ecpg vpath build error