DELETE performance problem

From: Luca Tettamanti <kronos(dot)it(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: DELETE performance problem
Date: 2009-11-24 13:37:08
Message-ID: 20091124133708.GA6235@nb-core2.darkstar.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
Table "public.t1"
Column | Type | Modifiers
---------------+--------+-----------
annotation_id | bigint | not null
Indexes:
"t1_pkey" PRIMARY KEY, btree (annotation_id)

test2=# \d t2
Table "public.t2"
Column | Type | Modifiers
---------------+--------+-----------
annotation_id | bigint |
Indexes:
"t2_idx" btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2=> explain analyze delete from t1 where annotation_id in (select annotation
_id from t2);
QUERY PLAN

--------------------------------------------------------------------------------
---------------------------------------------------------
Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339
5.565..1832056.588 rows=26185953 loops=1)
Hash Cond: (t1.annotation_id = t2.annotation_id)
-> Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim
e=0.291..179119.487 rows=45874812 loops=1)
-> Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433
93.742..643393.742 rows=26185953 loops=1)
-> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
-> Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms

An identital linux machine (with 8.4.1) shows the same issue; with strace I see
a lots of seeks:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.37 0.155484 15 10601 read
9.10 0.015649 5216 3 fadvise64
0.39 0.000668 0 5499 write
0.15 0.000253 0 10733 lseek
0.00 0.000000 0 3 open
0.00 0.000000 0 3 close
0.00 0.000000 0 3 semop
------ ----------- ----------- --------- --------- ----------------
100.00 0.172054 26845 total

(30s sample)

Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace
is the following:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.063862 0 321597 read
0.00 0.000000 0 3 lseek
0.00 0.000000 0 76 mmap
------ ----------- ----------- --------- --------- ----------------
100.00 0.063862 321676 total

The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jerry Champlin 2009-11-24 14:59:10 Re: DELETE performance problem
Previous Message Pavel Stehule 2009-11-24 12:42:33 Re: Dynamic sql example