Re: Timestamp-based indexing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: "Harmon S(dot) Nine" <hnine(at)netarx(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Timestamp-based indexing
Date: 2004-07-26 15:59:11
Message-ID: 28247.1090857551@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
> VACUUM FULL ANALYZE every 3 hours seems a little severe.

If rows are only deleted once a day, that's a complete waste of time,
indeed.

I'd suggest running a plain VACUUM just after the deletion pass is done.
ANALYZEs are a different matter and possibly need to be done every
few hours, seeing that your maximum timestamp value is constantly
changing.

>> monitor=# set enable_seqscan = false;
>> SET
>> monitor=# explain analyze select * from "eventtable" where timestamp >
>> CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on "eventtable" (cost=100000000.00..100019009.97 rows=136444
>> width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
>> Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 9934.353 ms

You've got some datatype confusion, too. CURRENT_TIMESTAMP yields
timestamp with time zone, and since you made the timestamp column
timestamp without time zone, you've got a cross-type comparison which is
not indexable (at least not in 7.4). My opinion is that you chose the
wrong type for the column. Values that represent specific real-world
instants should always be timestamp with time zone, so that they mean
the same thing if you look at them in a different time zone.

Another issue here is that because CURRENT_TIMESTAMP - INTERVAL '10
minutes' isn't a constant, the planner isn't able to make use of the
statistics gathered by ANALYZE anyway. That's why the rowcount estimate
has nothing to do with reality. Unless you force the decision with
"set enable_seqscan", the planner will never pick an indexscan with this
rowcount estimate. The standard advice for getting around this is to
hide the nonconstant calculation inside a function that's deliberately
mislabeled immutable. For example,

create function ago(interval) returns timestamp with time zone as
'select now() - $1' language sql strict immutable;

select * from "eventtable" where timestamp > ago('10 minutes');

The planner folds the "ago('10 minutes')" to a constant, checks the
statistics, and should do the right thing. Note however that this
technique may break if you put a call to ago() inside a function
or prepared statement --- it's only safe in interactive queries,
where you don't care that the value is reduced to a constant during
planning instead of during execution.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2004-07-26 16:02:03 Re: Timestamp-based indexing
Previous Message Harmon S. Nine 2004-07-26 15:46:42 Re: Timestamp-based indexing