Re: best way to do bulk delete?

From: pg noob <pgnube(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: best way to do bulk delete?
Date: 2011-10-28 16:47:48
Message-ID: CAPNY-2VskCouya2aVOwQmT2BRhyd2VENe7ofz0Zt38Jjhnq2hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Thank you. That approach looks promising.
When you say that it is not side effect free, are there any other side
effects other than the composite types one you mentioned?
I don't believe these tables contain any composite types so it shouldn't be
an issue.

I tested it and found that there is no ALTER TABLE <foo> RENAME to <bar>
CASCADE.
It gives a syntax error on the CASCADE keyword. I am using postgres 8.4,
maybe cascade is supported here in a newer version.
But I am able to work around that by renaming the table indexes and
constraints individually.

Where this approach seems to break down is when the table has a lot of
dependent objects (dependent constraints, functions, views, triggers, etc.).
I could use DROP CASCADE but then I would also need to save and recreate all
those dependent objects as well.

But for simple tables that don't have a lot of dependent objects your
suggested approach seems like it can work well.

On Fri, Oct 28, 2011 at 11:19 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Fri, Oct 28, 2011 at 9:30 AM, pg noob <pgnube(at)gmail(dot)com> wrote:
> >
> > Greetings,
> >
> > I have an occasional recurring use case where I need to delete a large
> > number of rows from multiple tables as quickly as possible,
> > something like a bulk-delete.
> >
> > A typical example would be deleting a few hundred thousand rows at once
> from
> > a set of tables each containing 1 to 2 million rows,
> > but in a worst case scenario it could be as large as 1 million rows from
> a
> > set of tables each containing 4 to 5 million rows.
> >
> > These tables of course have indexes on them as well as foreign key
> > constraints and cascade deletes to other tables.
> >
> > I can't simply truncate the tables because the rows being deleted are
> > subsets of the total amount of data in the tables.
> >
> > These tables have heavy insert/update/delete activity going on at the
> same
> > time but mostly not on the same set of rows that
> > is being bulk-deleted (though there may be some update activity going on
> > which accesses those rows and hasn't yet quiesced).
> >
> > What is the best way of going about this?
> >
> > I've considered a few options.
> >
> > One option is to open a single transaction, issue delete statements that
> > delete huge numbers of rows from each
> > table in question (probably with a DELETE USING SQL query) and then
> commit
> > the transaction.
> > My concern with this approach is that it will hold a huge number of row
> > locks while in progress and may take a long
> > time to complete and could introduce deadlocks if it competes with other
> > updates that have acquired row locks out of order.
> >
> > Another option would be to delete one row per transaction or a smaller
> set
> > of rows as part of a transaction and then
> > commit the transaction, repeating in a loop.
> > This has the advantage that if the transaction hits an error and has to
> be
> > rolled back it doesn't have to redo the entire
> > delete operation again, and it doesn't hold as many row locks for as long
> a
> > time. The drawback is that I believe this approach
> > would be a lot slower.
> >
> > And finally, I've considered the idea of using COPY to copy the data that
> > needs to be kept to temporary tables,
> > truncating the original tables and then copying the data back. I believe
> > this would be the most efficient way to
> > do the delete but the implementation is quite a bit more complicated than
> > the first two options I described,
> > and has implications for how to deal with error scenarios or
> database/system
> > crashes while the operation is in progress.
>
> you don't need to COPY -- just insert/select, like this:
>
> CREATE TABLE keeprows(LIKE old_table INCLUDING INDEXES INCLUDING
> CONSTRAINTS);
> INSERT INTO keeprows SELECT * FROM old_table WHERE ...
> DROP TABLE old_table;
> ALTER TABLE keeprows RENAME to old_table CASCADE;
> <reset RI rules>
>
> This procedure is not side effect free and has some caveats: for
> example, if you have code that is dependent on the table's composite
> type that will also drop and has to be reconstructed.
>
> Perhaps a better way to go if you can structure your code around it is
> to simply partition your table around when it gets loaded and dropped
> -- then your deletion becomes 'DROP TABLE' with no extra processing.
>
> merlin
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Merlin Moncure 2011-10-28 17:01:30 Re: best way to do bulk delete?
Previous Message Detox 2011-10-28 16:21:13 PgAdmin III shows removed PostgreSQL