Re: optimizing selects on time-series data in Pg

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Aditya <aditya(at)grot(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, <sfpug(at)postgresql(dot)org>
Subject: Re: optimizing selects on time-series data in Pg
Date: 2003-08-01 18:07:41
Message-ID: 20030801110243.C55372-100000@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

On Fri, 1 Aug 2003, Aditya wrote:

> On Fri, Aug 01, 2003 at 09:58:24AM -0700, Josh Berkus wrote:
> > Can you post your "slow queries"?
>
> Here's what I did:
>
> - I did a vacuum
>
> - did a select count(*) for a few different virtualhosts to see how many rows
> were being dealt with
>
> - run explain analyze on each of those select count(*) and noted the
> difference, ie. the planner decided that a select on a virtual host with many
> more rows was better served by sequential scan than index scan
>
> The typical query is something like show me all the rows for a virtualhost in
> the last two days with the latest first, ie.
>
> select
> *
> from zp_log
> where
> timestamp > (now() - interval '2 days')
> and virtualhost = 'www.bloki.com';
>
> - dropped the virtualhost index and recreated it, verified the behaviour was
> the same as above
>
> - forced the planner not to use sequential scan and verified that the index
> scan was in fact more costly then, here's a somewhat edited transcript (only
> for brevity):

Hmm, does a two column index on timestamp,virtualhost (or possibly the
other way around) help at all when forcing an index scan? I wouldn't
guess that it'd help enough, but it might be worth trying.

In response to

Responses

Browse sfpug by date

  From Date Subject
Next Message Aditya 2003-08-01 18:25:07 Re: optimizing selects on time-series data in Pg
Previous Message Josh Berkus 2003-08-01 18:05:22 Re: optimizing selects on time-series data in Pg