Re: Replacing old tuples on insert

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: "Pradeepkumar, Pyatalo (IE10)" <Pradeepkumar(dot)Pyatalo(at)honeywell(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Replacing old tuples on insert
Date: 2004-11-27 07:57:18
Message-ID: 20041127075717.GA76802@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, Nov 24, 2004 at 02:20:42AM -0700, Pradeepkumar, Pyatalo (IE10) wrote:
>
> I have a table as follows -
> table TrendData( TrendID bigserial,
> TrendName varchar(30),
> .....
> .....);
>
> Now for this table i want a maximum of only 5000 tuples. After the maximum
> is reached the new tuples that are inserted should replace the older ones.
> Could anyone help me out in writing the query or a trigger would also be
> fine.

How accurate does the tuple count need to be? A simple but flawed
approach would be to use a rule like the following:

CREATE RULE TrendTrunc AS ON INSERT TO TrendData DO ALSO
DELETE FROM TrendData
WHERE TrendID <= currval('trenddata_trendid_seq') - 5000;

This rule is flawed because it doesn't account for gaps in the
sequence that could exist due to deleted rows or rolled back
transactions. Such gaps could cause the rule to truncate the
table to a lower row count than the desired 5000.

Here's a rule that handles gaps but is also flawed:

CREATE RULE TrendTrunc AS ON INSERT TO TrendData DO ALSO
DELETE FROM TrendData
WHERE TrendID < (SELECT MIN(TrendID)
FROM (SELECT TrendID
FROM TrendData
ORDER BY TrendID DESC
LIMIT 5000) AS s);

This rule could leave the table with more than 5000 rows if multiple
transactions are inserting at the same time.

The following trigger solution should handle gaps and strictly
enforce the 5000-row limit by locking the table before deleting
excess records, thus waiting for other transactions that have
inserted or deleted records to complete (unfortunately this will
hurt performance if concurrent transactions are doing inserts or
deletes). It uses a trigger because rules don't allow LOCK. The
trigger fires after each statement instead of after each row so
INSERTs that insert multiple rows will delete excess rows only once,
after all new rows have been inserted.

I haven't considered all possible situations and I've done only
simple testing, so the solution could have flaws. An improvement
would be to make the function reusable for multiple tables with
differing field names and maximum row counts; this is left as an
exercise for the reader.

CREATE FUNCTION TrendTrunc() RETURNS TRIGGER AS '
BEGIN
LOCK TABLE TrendData IN SHARE ROW EXCLUSIVE MODE;

DELETE FROM TrendData
WHERE TrendID < (SELECT MIN(TrendID)
FROM (SELECT TrendID
FROM TrendData
ORDER BY TrendID DESC
LIMIT 5000) AS s);

RETURN NULL;
END;
' LANGUAGE plpgsql;

CREATE TRIGGER TrendAfterInsert AFTER INSERT ON TrendData
FOR EACH STATEMENT EXECUTE PROCEDURE TrendTrunc();

If anybody sees problems or has suggestions for improvements then
please follow up. In particular, is there a more efficient way
to get the cutoff TrendID?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Keith Worthington 2004-11-27 17:25:21 missing function?
Previous Message Wayne T. Carnall 2004-11-27 06:50:26 Front end suggestions??