Re: Timestamp-based indexing

From: "Harmon S(dot) Nine" <hnine(at)netarx(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Timestamp-based indexing
Date: 2004-07-26 15:46:42
Message-ID: 41052762.50801@netarx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We were getting a little desperate, so we engaged in overkill to rule
out lack-of-analyze as a cause for the slow queries.

Thanks for your advice :)

-- Harmon

Matthew T. O'Connor wrote:

> VACUUM FULL ANALYZE every 3 hours seems a little severe. You will
> probably be be served just as well by VACUUM ANALYZE. But you
> probably don't need the VACUUM part most of the time. You might try
> doing an ANALYZE on the specific tables you are having issues with.
> Since ANALYZE should be much quicker and not have the performance
> impact of a VACUUM, you could do it every hour, or even every 15 minutes.
>
> Good luck...
>
> Harmon S. Nine wrote:
>
>> Hello --
>>
>> To increase query (i.e. select) performance, we're trying to get
>> postgres to use an index based on a timestamp column in a given table.
>>
>> Event-based data is put into this table several times a minute, with
>> the timestamp indicating when a particular row was placed in the table.
>>
>> The table is purged daily, retaining only the rows that are less than
>> 7 days old. That is, any row within the table is less than 1 week
>> old (+ 1 day, since the purge is daily).
>>
>> A typical number of rows in the table is around 400,000.
>>
>> A "VACUUM FULL ANALYZE" is performed every 3 hours.
>>
>>
>> The problem:
>> We often query the table to extract those rows that are, say, 10
>> minutes old or less.
>>
>> Given there are 10080 minutes per week, the planner could, properly
>> configured, estimate the number of rows returned by such a query to be:
>>
>> 10 min/ 10080 min * 400,000 = 0.001 * 400,000 = 400.
>>
>> Making an index scan, with the timestamp field the index, far faster
>> then a sequential scan.
>>
>>
>> However, we can't get the planner to do an timestamp-based index scan.
>>
>> Anyone know what to do?
>>
>>
>> Here's the table specs:
>>
>> monitor=# \d "eventtable"
>> Table "public.eventtable"
>> Column | Type |
>> Modifiers
>> -----------+-----------------------------+--------------------------------------------------------------
>>
>> timestamp | timestamp without time zone | not null default
>> ('now'::text)::timestamp(6) with time zone
>> key | bigint | not null default
>> nextval('public."eventtable_key_seq"'::text)
>> propagate | boolean |
>> facility | character(10) |
>> priority | character(10) |
>> host | character varying(128) | not null
>> message | text | not null
>> Indexes:
>> "eventtable_pkey" primary key, btree ("timestamp", "key")
>> "eventtable_host" btree (host)
>> "eventtable_timestamp" btree ("timestamp")
>>
>>
>> Here's a query (with "explain analyze"):
>>
>> monitor=# explain analyze select * from "eventtable" where timestamp
>> > CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on "eventtable" (cost=0.00..19009.97 rows=136444 width=155)
>> (actual time=11071.073..11432.522 rows=821 loops=1)
>> Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 11433.384 ms
>> (3 rows)
>>
>>
>> Here's something strange. We try to disable sequential scans, but to
>> no avail. The estimated cost skyrockets, though:
>>
>> monitor=# set enable_seqscan = false;
>> SET
>> monitor=# explain analyze select * from "eventtable" where timestamp
>> > CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on "eventtable" (cost=100000000.00..100019009.97
>> rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763
>> loops=1)
>> Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 9934.353 ms
>> (3 rows)
>>
>> monitor=# set enable_seqscan = true;
>> SET
>> monitor=#
>>
>>
>>
>> Any help is greatly appreciated :)
>>
>> -- Harmon
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-07-26 15:59:11 Re: Timestamp-based indexing
Previous Message Harmon S. Nine 2004-07-26 15:42:26 Re: Timestamp-based indexing