Re: Table with active and historical data

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Table with active and historical data
Date: 2011-06-02 01:56:04
Message-ID: BANLkTik-RnmigAg43K6vB0XRt6WvbyeWOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jun 1, 2011 at 7:30 PM, Robert James <srobertjames(at)gmail(dot)com> wrote:
> I have a table with a little active data and a lot of historical data.
>  I'd like to be able to access the active data very quickly - quicker
> than an index.  Here are the details:
>
> 1. Table has about 1 million records
> 2. Has a column active_date - on a given date, only about 1% are
> active.  active_date is indexed and clustered on.
> 3. Many of my queries are WHERE active_date = today.  Postgres uses
> the index for these, but still lakes quite a lot of time.  I repeat
> these queries regularly.

can we see a query and its 'explain analyze' that you think takes a lot of time?

> 4. I'd like to somehow presort or partition the data so that Postgres
> doesn't have to do an index scan each time.  I'm not sure how to do
> this? Idea?  I know it can be done with inheritance and triggers (
> http://stackoverflow.com/questions/994882/what-is-a-good-way-to-horizontal-shard-in-postgresql
> ), but that method looks a little too complex for me.  I'm looking for
> something simple.

an index scan should be good enough, but if it isn't you can look at
partitioning. let's make sure that's really necessary before doing it
however.

> 5. Finally, I should point out that I still do a large number of
> queries on historical data as well.
>
> What do you recommend? Ideas? Also: Why doesn't cluster on active_date
> solve the problem? Specifically, if I run SELECT * FROM full_table
> WHERE active_date = today, I get a cost of 3500.  If I first select
> those records into a new table, and then do SELECT * on the new table,
> I get a cost of 64.  Why is that? Why doesn't clustering pregroup
> them?

clustering is a tool that allows you to control which tuples are
grouped together on pages -- if you are pulling up more than one tuple
a time hopefully you can reduce the total number of pages you have to
scan by doing it. The bigger the table is, the more that matters.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-06-02 02:32:12 Re: SELECT to_timestamp crash PostgreSQL 9.1beta1
Previous Message Craig Ringer 2011-06-02 00:50:30 Re: Mixed up protocol packets in server response?