Re: Function-based index not used in a simple query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rostislav Opocensky <orbis(at)pictus(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org, Pavel Noga <pnoga(at)unreal(dot)cz>
Subject: Re: Function-based index not used in a simple query
Date: 2000-05-30 21:36:36
Message-ID: 3425.959722596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Rostislav Opocensky <orbis(at)pictus(dot)org> writes:
> SELECT *
> FROM vals
> WHERE trunc_to_day(timestamp) = trunc_to_day('28.5.2000');
>
> Seq Scan on vals (cost=0.00..27.50 rows=10 width=44)

The problem here is that the optimizer will only consider an indexscan
for a clause that looks like index_key OP constant. It doesn't think
that trunc_to_day('28.5.2000') is a constant, because you haven't told
it that it can pre-evaluate that function call --- and for all it knows,
the function might generate different values when evaluated at different
rows of the table, rendering an indexscan incorrect. (Consider a
function like random() ...) If you mark trunc_to_day() as a "cachable"
function then the system will know it's safe to collapse the righthand
side to a constant:

CREATE FUNCTION trunc_to_day(timestamp) RETURNS timestamp AS '
DECLARE tstamp ALIAS FOR $1;
BEGIN
RETURN date_trunc(''day'', tstamp);
END;
' LANGUAGE 'plpgsql'
WITH (isCachable);

This is a little bit of a cheat for this function, because what
isCachable actually implies is that the function's value depends
on the presented arguments *and nothing else*. I believe date_trunc
depends on the current TZ setting, so it's not really cachable.
You could possibly get away with it for your application, but you
might run into problems like this:

1. You write trunc_to_day('28.5.2000') in a table's column
default expression, stored rule, or some such. The system
decides it can fold that expression down to a simple constant,
giving you (say) 2000-05-20 00:00-04 --- ie, midnight in
whatever timezone you are using. That value gets stored in
the column or rule definition.

2. You then access the database while running with a different
TZ setting. You wonder why you don't get the answers you expected.

Come to think of it, your index on trunc_to_day() is fundamentally
risky because different backends running with different TZ settings
would generate different index entries for the "same" data entry.
For example, if you make an entry dated '28.5.2000' while running in
one timezone, and then try to query with "= trunc_to_day('28.5.2000')"
while running in another timezone, you won't match that entry if the
index is used, because the timestamp generated on-the-fly from
trunc_to_day('28.5.2000') won't equal the one stored in the index.

(Hmm ... I wonder if that suggests that we shouldn't allow indexes
to be built using functions that are not isCachable?)

You could avoid some of these risks by having trunc_to_day return
a 'date' rather than a 'timestamp', thereby removing at least some
of the timezone dependency from the index definition.

regards, tom lane

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Ryan Bradetich 2000-05-30 23:48:42 Re: Use of index in 7.0 vs 6.5
Previous Message mikeo 2000-05-30 19:07:59 Re: Re: [SQL] remove line type?