Quick Links

Re: best way to do bulk delete?

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	pg noob <pgnube(at)gmail(dot)com>
Cc:	pgsql-novice <pgsql-novice(at)postgresql(dot)org>
Subject:	Re: best way to do bulk delete?
Date:	2011-10-28 15:19:53
Message-ID:	CAHyXU0wism0HkYj9A9=9AKD=07igQ8MZ3vuvfcZ6s3L8B_fing@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

On Fri, Oct 28, 2011 at 9:30 AM, pg noob <pgnube(at)gmail(dot)com> wrote:
>
> Greetings,
>
> I have an occasional recurring use case where I need to delete a large
> number of rows from multiple tables as quickly as possible,
> something like a bulk-delete.
>
> A typical example would be deleting a few hundred thousand rows at once from
> a set of tables each containing 1 to 2 million rows,
> but in a worst case scenario it could be as large as 1 million rows from a
> set of tables each containing 4 to 5 million rows.
>
> These tables of course have indexes on them as well as foreign key
> constraints and cascade deletes to other tables.
>
> I can't simply truncate the tables because the rows being deleted are
> subsets of the total amount of data in the tables.
>
> These tables have heavy insert/update/delete activity going on at the same
> time but mostly not on the same set of rows that
> is being bulk-deleted (though there may be some update activity going on
> which accesses those rows and hasn't yet quiesced).
>
> What is the best way of going about this?
>
> I've considered a few options.
>
> One option is to open a single transaction, issue delete statements that
> delete huge numbers of rows from each
> table in question (probably with a DELETE USING SQL query) and then commit
> the transaction.
> My concern with this approach is that it will hold a huge number of row
> locks while in progress and may take a long
> time to complete and could introduce deadlocks if it competes with other
> updates that have acquired row locks out of order.
>
> Another option would be to delete one row per transaction or a smaller set
> of rows as part of a transaction and then
> commit the transaction, repeating in a loop.
> This has the advantage that if the transaction hits an error and has to be
> rolled back it doesn't have to redo the entire
> delete operation again, and it doesn't hold as many row locks for as long a
> time. The drawback is that I believe this approach
> would be a lot slower.
>
> And finally, I've considered the idea of using COPY to copy the data that
> needs to be kept to temporary tables,
> truncating the original tables and then copying the data back. I believe
> this would be the most efficient way to
> do the delete but the implementation is quite a bit more complicated than
> the first two options I described,
> and has implications for how to deal with error scenarios or database/system
> crashes while the operation is in progress.

you don't need to COPY -- just insert/select, like this:

CREATE TABLE keeprows(LIKE old_table INCLUDING INDEXES INCLUDING CONSTRAINTS);
INSERT INTO keeprows SELECT * FROM old_table WHERE ...
DROP TABLE old_table;
ALTER TABLE keeprows RENAME to old_table CASCADE;
<reset RI rules>

This procedure is not side effect free and has some caveats: for
example, if you have code that is dependent on the table's composite
type that will also drop and has to be reconstructed.

Perhaps a better way to go if you can structure your code around it is
to simply partition your table around when it gets loaded and dropped
-- then your deletion becomes 'DROP TABLE' with no extra processing.

merlin

In response to

best way to do bulk delete? at 2011-10-28 14:30:13 from pg noob

Responses

Re: best way to do bulk delete? at 2011-10-28 16:47:48 from pg noob

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Detox	2011-10-28 16:21:13	PgAdmin III shows removed PostgreSQL
Previous Message	pg noob	2011-10-28 14:30:13	best way to do bulk delete?