Re: [HACKERS] Maintaining cluster order on insert

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: genekhart(at)gmail(dot)com
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Maintaining cluster order on insert
Date: 2006-08-10 02:49:43
Message-ID: 9461.1155178183@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Gene <genekhart(at)gmail(dot)com> writes:
> I have a table that inserts lots of rows (million+ per day) int8 as primary
> key, and I cluster by a timestamp which is approximately the timestamp of
> the insert beforehand and is therefore in increasing order and doesn't
> change. Most of the rows are updated about 3 times over time roughly within
> the next 30 minutes.

ISTM you should hardly need to worry about clustering that --- the data
will be in timestamp order pretty naturally.

The main problem you're going to have is the update-3-times bit. You
could keep updated rows on the same page as the original if you ran the
table at fillfactor 25% (which you'll be able to do in 8.2) ... but
while this might be sane for the leading edge of the table, you hardly
want such low storage density in the stable part.

You could reduce the fillfactor requirement if you could vacuum the
table constantly (every 10 minutes or so) but I assume the table is
large enough to make that unattractive. (Eventually we should have
a version of vacuum that understands where the dirty stuff is, which
might make this approach tenable ... but not in 8.2.)

Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2006-08-10 03:08:51 Re: Buildfarm failure on ecpg/test/pgtypeslib
Previous Message Bruce Momjian 2006-08-10 02:39:49 Re: [HACKERS] pgsql: Move "#define inline __inline"

Browse pgsql-patches by date

  From Date Subject
Next Message Gene 2006-08-10 05:18:00 Re: [HACKERS] Maintaining cluster order on insert
Previous Message Satoshi Nagayasu 2006-08-10 02:14:18 Re: pgstattuple extension for indexes