Re: [HACKERS] Maintaining cluster order on insert

From: Gene <genekhart(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Maintaining cluster order on insert
Date: 2006-08-10 05:18:00
Message-ID: 430d92a20608092218u7f44f8baob801aea8b136e5dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

You are correct the main part I'm worried about is the updates, being so far
from the originals. fyi I am partitioning the tables by the timestamp
column,vacuum analyzing once per hour, creating one child partition per day
in a cron job. Because I'm using hibernate for database abstraction
(stateless sessions), I can only have one RULE since having more than one
insert rule will not return the correct number of updated rows which
confuses hibernate. The one rule just directs inserts to the latest child
partition which seems to work well.

The reason I'm doing the clustering is I was hoping that with the "stable"
non-updating partitions I could execute a CLUSTER at night (slow...) and it
would compact the tables into their most efficient state for querying which
always involves a date range. bad idea? In this fillfactor feature, will you
be able to set it to 100% once you know that no more updates will occur? Or
will doing a cluster effectively do this? Will the fill factor only apply
for inserts?

"Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care."

This sounds interesting, I could create a RULE/INSERT on the unstable table,
I will know during the update if it is ready to be put in the stable table.
What would be an efficient way to do the transfer? Since the updates occur
somewhat randomly, wouldnt the tuples in the stable table then be out of
natural timestamp order?

thanks for all of your help and comments! it is greatly appreciated!
Gene Hart

On 8/9/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Gene <genekhart(at)gmail(dot)com> writes:
> > I have a table that inserts lots of rows (million+ per day) int8 as
> primary
> > key, and I cluster by a timestamp which is approximately the timestamp
> of
> > the insert beforehand and is therefore in increasing order and doesn't
> > change. Most of the rows are updated about 3 times over time roughly
> within
> > the next 30 minutes.
>
> ISTM you should hardly need to worry about clustering that --- the data
> will be in timestamp order pretty naturally.
>
> The main problem you're going to have is the update-3-times bit. You
> could keep updated rows on the same page as the original if you ran the
> table at fillfactor 25% (which you'll be able to do in 8.2) ... but
> while this might be sane for the leading edge of the table, you hardly
> want such low storage density in the stable part.
>
> You could reduce the fillfactor requirement if you could vacuum the
> table constantly (every 10 minutes or so) but I assume the table is
> large enough to make that unattractive. (Eventually we should have
> a version of vacuum that understands where the dirty stuff is, which
> might make this approach tenable ... but not in 8.2.)
>
> Your best bet might be to partition the table into two subtables, one
> with "stable" data and one with the fresh data, and transfer rows from
> one to the other once they get stable. Storage density in the "fresh"
> part would be poor, but it should be small enough you don't care.
>
> regards, tom lane
>

--
Eugene Hart

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-08-10 06:32:06 Re: Forcing current WAL file to be archived
Previous Message Tom Lane 2006-08-10 04:34:38 Re: 8.2 features status

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-08-10 06:32:06 Re: Forcing current WAL file to be archived
Previous Message Tom Lane 2006-08-10 02:49:43 Re: [HACKERS] Maintaining cluster order on insert