Re: Planner estimates and cast operations ,...

From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner estimates and cast operations ,...
Date: 2006-09-04 15:19:37
Message-ID: 14AF07E7-CEFC-4BC3-96C5-CFC394D7DE87@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi tom ...

i thought about creating an index on the expression but the problem
is that this is hardly feasable.
in 8.0 (what i have here) this would block the table and i would run
out of disk space as well. this is a 600 gb biest :(

what about the planner approach?
this would solve the problem for some other issues as well. an index
might not be flexible enough :(.

many thanks,

hans

On Sep 4, 2006, at 4:57 PM, Tom Lane wrote:

> Hans-Juergen Schoenig <postgres(at)cybertec(dot)at> writes:
>> consider the following:
>
>> SELECT some_timestamp::date FROM very_large_table GROUP BY
>> some_timestamp::date
>
>> my very_large_table is around 1billion entries.
>> the problem is: the planner has a problem here as it is taking the
>> (correct) estimates for timestamp. this avoids a HashAggregate
>> because the dataset seems to large for work_mem.
>> what the planner cannot know is that the number of days is quite
>> limited (in my case around 1000 different values).
>> i wonder how to teach the planner to take the cast into
>> consideration.
>
> Create an index on that expression.
>
> regression=# create table foo(x) as select x * '864 sec'::interval
> + now()::timestamp from generate_series(1,10000) x;
> SELECT
> regression=# analyze foo;
> ANALYZE
> regression=# explain select x::date from foo group by x::date;
> QUERY PLAN
> ---------------------------------------------------------------
> HashAggregate (cost=205.00..330.00 rows=10000 width=8)
> -> Seq Scan on foo (cost=0.00..180.00 rows=10000 width=8)
> (2 rows)
>
> regression=# create index fooi on foo((x::date));
> CREATE INDEX
> regression=# analyze foo;
> ANALYZE
> regression=# explain select x::date from foo group by x::date;
> QUERY PLAN
> ---------------------------------------------------------------
> HashAggregate (cost=205.00..206.26 rows=101 width=8)
> -> Seq Scan on foo (cost=0.00..180.00 rows=10000 width=8)
> (2 rows)
>
> regression=#
>
> I had to cheat a little bit here: I tried to do this example with a
> timestamptz column, and the index creation failed because
> timestamptz to
> date isn't immutable (it depends on TimeZone). If yours is too, you
> could perhaps do something involving AT TIME ZONE to generate an
> immutable conversion to date.
>
> It would perhaps make sense to provide a way to cue ANALYZE to compute
> stats on expressions that aren't actually being indexed, but I see no
> good reason to limit our attention to cast expressions.
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2006-09-04 15:21:50 Re: @ versus ~, redux
Previous Message Tom Lane 2006-09-04 15:13:25 Re: [COMMITTERS] pgsql: sslinfo contrib module - information about current SSL