Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?

From: Erik Jones <erik(at)myemma(dot)com>
To: Jamie Tufnell <diesql(at)googlemail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
Date: 2008-01-08 18:28:52
Message-ID: 87236C1C-B08E-4521-A141-C7C65EDD3545@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Jan 8, 2008, at 11:41 AM, Jamie Tufnell wrote:

> On 1/8/08, codeWarrior <gpatnude(at)hotmail(dot)com> wrote:
>> Jamie:
>>
>> I think you are probably having slowdown issues in your "DELETE
>> FROM WHERE
>> NOT IN SELECT ORDER BY DESCENDING" construct -- that seems a bit
>> convoluted
>> to me....
>
> Hmm so rather than NOT IN ( .. LIMIT 50) would you suggest IN ( ...
> OFFSET 50) like in Erik's example? Or something else entirely?

Well, that would give you some gain. Think about it like this: once
a given user's history records are at 50 and you insert a row, if you
use the NOT IN clause your comparing each of 51 rows to each of the
50 you want to keep to find the one that can go while with the IN
version your comparing each of the 51 rows to the 1 that can go. Now
how much of a gain that will be I can't say, YMMV. I don't remember
you saying anything about it so I'll also go ahead and point out that
you most likely will want an index on user_id if you don't already.

>> ALSO: It looks to me like you have a column named "timestamp' ???
>> This is
>> bad practice since "timestamp" is a reserved word... You really
>> ought NOT to
>> use reserved words for column names... different debate.
>
> I do realize it would be better to use something else and thanks for
> the tip :-) This is an established database and "timestamp" has been
> used in other tables which is why I stuck to it here.. one day when
> time permits maybe I'll rename them all!
>
>> Why bother deleting records anyway ? Why not alter your query that
>> tracks
>> the 50 records to LIMIT 50 ???
>
> The read query does LIMIT 50 and the reason for deleting the rest of
> the records is because they're not needed by the application and
> there's loads of them being created all the time (currently several
> million unnecessary rows) -- I imagine eventually this will slow
> things down?
>
> Do you think a regular batch process to delete rows might be more
> appropriate than a trigger in this scenario?

That depends on your usage pattern. Assuming you aren't running user
history report queries constantly that's probably what I'd do. Also,
if you're sure you won't need anything but the last 50 records per
user, I'd definitely agree with cleaning out data that's not needed.

Erik Jones

DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jamie Tufnell 2008-01-08 18:49:06 Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?
Previous Message Gerardo Herzig 2008-01-08 18:19:21 trigger for TRUNCATE?