how to delete many rows from a huge table?

From: mARK bLOORE <mbloore(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: how to delete many rows from a huge table?
Date: 2009-05-25 19:29:54
Message-ID: d9425f7b0905251229g289edc15u4bf9dded7592846@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a table of about 1 G rows, and I want to delete about 5 M rows,
listed in another table.
The big table is

Table "public.backlinks"
Column | Type | Modifiers | Description
-----------------+-------+-----------+-------------
key | bytea | not null |
backlink | text | not null |
backlink_hash | bytea | not null |
url | text | not null |
time_downloaded | date | not null |
Triggers:
insert_backlinks_trigger BEFORE INSERT ON backlinks FOR EACH ROW
EXECUTE PROCEDURE backlinks_insert_trigger()
Has OIDs: no

It is divided into 64 partitions, like

Table "public.backlinks_0"
Column | Type | Modifiers | Description
-----------------+-------+-----------+-------------
key | bytea | not null |
backlink | text | not null |
backlink_hash | bytea | not null |
url | text | not null |
time_downloaded | date | not null |
Indexes:
"backlinks_0_pkey" PRIMARY KEY, btree (key, backlink_hash),
tablespace "fastdisk"
Check constraints:
"backlinks_0_key_check" CHECK (partition(key) = 0)
Inherits: backlinks
Has OIDs: no

The partitions are very evenly filled. The partition function just
takes the first byte modulo 64.

The table listing what I want to delete has just the key values.

If I just do

DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;

then it grinds for an hour or so and runs out of memory.

If I do

DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
bl LIMIT 40 OFFSET 0);

it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
minutes til I kill it. EXPLAIN says that it is doing a sequential
scan on several of the partitions in the 50 case, but not the 40.
Auto-vacuuming is running on the DB.

If I put that DELETE into a procedure and loop on the offset, it acts
like the 50 case, even if i set the LIMIT to 1.

I am currently running a shell loop doing the delete in chunks of 40,
but it has been running all weekend and is only half way through.

What am I doing wrong?

--
mARK bLOORE <mbloore(at)gmail(dot)com>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message iSTRONG 2009-05-25 21:50:55 Re: PGS Tuning Wizard destroys my login
Previous Message Grzegorz Jaśkiewicz 2009-05-25 16:23:35 composite type and domain