Skip site navigation (1) Skip section navigation (2)

Performance problems on a fairly big table with two key columns.

From: "Rasmus Aveskogh" <rasmus(at)defero(dot)se>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance problems on a fairly big table with two key columns.
Date: 2003-09-04 22:53:46
Message-ID: 1564.62.119.108.236.1062716026.squirrel@www.defero.se (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

I have a table that looks like this:

  DATA   ID   TIME
|------|----|------|

The table holds app. 14M rows now and grows by app. 350k rows a day.

The ID-column holds about 1500 unique values (integer).
The TIME-columns is of type timestamp without timezone.

I have one index (b-tree) on the ID-column and one index (b-tree) on the
time-column.

My queries most often look like this:

SELECT DATA FROM <tbl> WHERE ID = 1 AND TIME > now() - '1 day'::interval;

or

SELECT DATA FROM <tbl> WHERE ID = 2 AND TIME > now() - '1 week'::interval;


Since I have about 350000 rows the last 24 hours the query planner chooses
to use my ID-index to get hold of the rows - then using only a filter on
the time column.

This takes a lot of time (over a minute) on a P4 1900MHz which
unfortenately isn't good enough for my purpose (webpages times out and so
on..).


If I SELECT only the rows with a certain ID (regardless of time):

SELECT DATA FROM <tbl> WHERE ID = 3;

..it still takes almost a minute so I guess this is the problem (not the
filtering on the TIME-column), especially since it recieves a lot of rows
which will be descarded using my filter anyway.
(I recieve ~6000 rows and want about 250).

But using the TIME-column as a first subset of rows and discarding using
the ID-column as a filter is even worse since I then get 350k rows and
discards about 349750 of them using the filter.

I tried applying a multicolumn index on ID and TIME, but that one won't
even be used (after ANALYZE).

My only option here seems to have like a "daily" table which will only
carry the rows for the past 24 hours which will give my SELECT a result of
6000 initial rows out of ~350k (instead of 14M like now) and then 250 when
filtered.
But I really hope there is a cleaner solution to the problem - actually I
though a multicolumn index would do it.

-ra


Responses

pgsql-performance by date

Next:From: scott.marloweDate: 2003-09-04 22:58:19
Subject: Re: SELECT's take a long time compared to other DBMS
Previous:From: scott.marloweDate: 2003-09-04 22:35:08
Subject: Re: PostgreSQL Reliability when fsync = false on Linux-XFS

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group