Re: Bulk processing & deletion

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Ivan Voras <ivoras(at)freebsd(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Bulk processing & deletion
Date: 2011-10-13 12:34:00
Message-ID: CAF-3MvOU1nUiD2wv6R4N53wQPJSTkm3aCW6-xfWdZZks8XXYQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 13 October 2011 14:20, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
> Hello,
>
> I have a table with a large number of records (millions), on which the
> following should be performed:
>
>        1. Retrieve a set of records by a SELECT query with a WHERE condition
>        2. Process these in the application
>        3. Delete them from the table
>
> Now, in the default read-committed transaction isolation, I can't just
> use the same WHERE condition with a DELETE in step 3 as it might delete
> more records than are processed in step 1 (i.e. phantom read). I've
> thought of several ways around it and would like some feedback on which
> would be the most efficient:
>
> #1: Create a giant DELETE WHERE ... IN (...) SQL command for step #3
> with primary keys of records from step 1 - but will it hit a SQL string
> length limitation in the database? Is there such a limit (and what is it?)
>
> #2: Same as #1 but with batching the records to e.g. 1000 at a time, all
> in one transaction
>
> #2: Use a higher isolation level, probably Repeatable Read (PG 9.0) -
> but then the question is will this block other clients from inserting
> new data into the table? Also, is Repeatable Read enough?
>
> Any other ideas?

CREATE TABLE to_delete (
job_created timestamp NOT NULL DEFAULT now(),
fk_id int NOT NULL
);

-- Mark for deletion
INSERT INTO to_delete (fk_id) SELECT id FROM table WHERE condition = true;

-- Process in app
SELECT table.* FROM table INNER JOIN to_delete ON (table.id = to_delete.fk_id);

-- Delete them
DELETE FROM table WHERE id IN (SELECT fk_id FROM to_delete);

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregg Jaskiewicz 2011-10-13 12:37:57 Re: Bulk processing & deletion
Previous Message Raymond O'Donnell 2011-10-13 12:31:47 Re: Ideas for query