Quick Links

Re: DELETE FROM takes forever

From:	Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: DELETE FROM takes forever
Date:	2011-02-12 08:09:21
Message-ID:	ij5f7h$shf$1@reversiblemaps.ath.cx
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On 2011-02-10, Josh <slushie(at)gmail(dot)com> wrote:
> Hi
>
> I'm trying to do a DELETE FROM on my large table (about 800 million
> rows) based on the contents of another, moderately large table (about
> 110 million rows). The command I'm using is:
>
> DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);
>
> This process ran for about two weeks before I decided to stop it -- it
> was dragging down the DB server. I can understand long-running
> processes, but two weeks seems a bit much even for a big table.

I find that scripting deletes of smaller numbers of records can help
here, long-running queries do bad things to the efficiency of postgres.

on strategy that could work for your task would be to create a temp
table first:

create temp table planned_deletions as select id from records except
SELECT id FROM unique_records;

create index badids on planned_deletions(id);

the repeatedly

delete from records where id in ( select id from planned_deletions limit 10000 order by id);
delete from planned_deletions where id in ( select id from planned_deletions limit 10000 order by id);

until there are none left.

possibly pausing a few seconds between each slug if there is a heavy
load on the server (that you were able to run the query for 2 weeks
suggests that there may not be).

> Is this the best way to approach the problem? Is there a better way?
>
> Some background: The server is version 8.3, running nothing but Pg.
> The 'records' table has 'id' as its primary key, and one other index
> on another column. The table is referenced by just about every other
> table in my DB (about 15 other tables) via foreign key constraints,
> which I don't want to break (which is why I'm not just recreating the
> table rather than deleting rows). Most of the dependent tables have ON
> DELETE CASCADE. The 'unique_records' table is a temp table I got via
> something like: SELECT DISTINCT (other_column) id INTO unique_records
> FROM records

if you can accept the down-time I would drop the constraints

(if you don't have them on file do a pg_dump --schema-only , and grep it
for the ADD CONSTRIANT commands, use sed or similar to create matching
DROP CONSTRAINT commands, run them)

then rebuild the table

then reintroduce the constraints, keep a copy of the grep output above

--
⚂⚃ 100% natural

In response to

DELETE FROM takes forever at 2011-02-10 17:57:23 from Josh

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Jasen Betts	2011-02-12 09:15:11	Re: UTF characters compromising data import.
Previous Message	Samuel Gendler	2011-02-12 00:55:10	Re: Drawing a blank on some SQL