Re: Delete/update with limit

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Delete/update with limit
Date: 2007-07-23 18:52:13
Message-ID: 60644bymua.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

nagy(at)ecircle-ag(dot)com (Csaba Nagy) writes:
> In postgres we're currently not chunking, due to the fact that the code
> to do it is simply overly contorted and inefficient compared to the
> other DBs we use. At least all the solutions we could think of to do the
> chunking in a safe way while the inserts are running in parallel,
> without disturbing them, have invariably resulted in overly complicated
> code compared to the simple delete with limit + delete trigger solution
> we have for the other DBs.
>
> Now I don't put too much hope I can convince anybody that the limit on
> the delete/update commands has valid usage scenarios, but then can
> anybody help me find a good solution to chunk-wise process such a buffer
> table where insert speed is the highest priority (thus no indexes, the
> minimum of fields), and batch processing should still work fine with big
> table size, while not impacting at all the inserts, and finish in short
> time to avoid long running transactions ? Cause I can't really think of
> one... other than our scheme with the delete with limit + trigger +
> private temp table thing.

All that comes to mind is to put a SERIAL primary key on the table,
which shouldn't be *too* terribly expensive an overhead, assuming
there is reasonably complex processing going on; you then do something
like:

- select ID from the incoming table, order by ID, limit 500, to grab a
list of IDs;

- delete from the table for that set of IDs.

Actually, is there any particular reason why you couldn't simply have
your "batch processing" loop look like:

Loop Forever
DELETE from incoming_table;
VACUUM incoming_table;
End Loop;

???

The alternative that I suggested amounts to:

Loop Forever
DELETE from incoming_table where id in (select id from incoming_table limit 500);
VACUUM incoming_table;
End Loop;

I realize you're concerned that maintaining the index will be too
costly; I don't think it is obvious without actual benchmarking that
this is *in fact* too costly.

I'm pretty sure of one countervailing consideration: there's a cost to
VACUUMing the table that will throw in some costs; it is possible that
the cost of the index would be noise against that.
--
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/lisp.html
When a man talks dirty to a woman, its sexual harassment. When a woman
talks dirty to a man, it's 3.95 per minute.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2007-07-23 18:59:22 Re: Using COALESCE nside function
Previous Message A. Kretschmer 2007-07-23 18:49:55 Re: Using COALESCE nside function