Re: Indexing Strategy

From: Matthew Engelbert <mje1975(at)yahoo(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: Indexing Strategy
Date: 2004-12-11 00:57:02
Message-ID: 20041211005702.75029.qmail@web60302.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-sql

The select query will be like this:

select * from quote where sybmol = 'MSFT' and extract(doy from time) = 340
order by time

I need to select all the records for a particular symbol, for a particular day,
ordered by time. I might need to select the records in the same way, but for a
shorter time interval, like some hour during the day. I'm using Postgresql
version 7.4.

Other info I forgot to mention.... this data is historical, and so I won't be
doing lots of inserts, updates or deletes - I'll just load a big chunk of data
once in a while, and rebuild the index. I realize that I can also add a lookup
table for symbol, as there are only about 150 unique values. Time values are
about 99% unique. Yes, I'm sure I need to tune the db better. I'm new to
Postgresql. Thanks for the help.

--- Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> Matthew,
>
> > I'm having trouble correctly indexing a large table. The table will be
> > receiving about 30 million rows/day, and has the following definition:
> >
> > CREATE TABLE quote (
> > symbol varchar (5),
> > price numeric (7,4),
> > time timestamp
> > );
> >
> > I need to configure the indices so that I can minimize the time to request
> > all the records for a particular symbol, for a particular day, ordered by
> > time.
>
> Please post the query you're using. How are you selecting the day? What
> version of PostgreSQL are you using?
>
> > I've tried clustering a single index based on symbol, then time, but this
> > operation takes much too long. I think this is because this operation
> > would have O(N^2) complexity(?).
>
> Nope, just because it's a very large table and you're probably swapping.
> Try
> increasing sort_mem dramatically, like half your RAM.
>
> > I think my next best option would be to
> > cluster on an index based on symbol, then have another index on time.
>
> Not that either. You need a multicolumn index. When you answer the above
> quesitons, I'll explain it.
>
> > Does any one know of some resources on the web that discusses indexing
> > strategies ? Any help would be appreciated? Thanks.
>
> Not that covers your question. I'm writing a book which covers this right
> now, but it won't be published until January 2006.
>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Michael Fuhr 2004-12-11 01:55:30 Re: Get comment
Previous Message Michael Fuhr 2004-12-11 00:55:42 Re: Version 8 & column order

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2004-12-11 02:28:38 Re: Query is not using index when it should
Previous Message Stephan Szabo 2004-12-11 00:48:24 Re: Cast NULL into Timestamp?