Re: Optimizing DELETE

From: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>
To: Ivan Voras <ivoras(at)fer(dot)hr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimizing DELETE
Date: 2006-09-19 14:17:25
Message-ID: 1158675445.9657.1587.camel@archimedes
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You do not have indexes on all of the columns which are linked by
foreign key constraints.

For example, let's say that I had a "scientist" table with a single
column "scientist_name" and another table "discovery" which had
"scientist_name" as a column with a foreign key constraint to the
"scientist" table.

If the system were to try to delete a row from the scientist table, then
it would need to scan the discovery table for any row which referenced
that scientist_name.

If there is an index on the scientist_name column in the discovery
table, this is a fast operation. In your case however, there most
likely isn't an index on that column, so it needs to do a full table
scan of the discovery table for each row deleted from the scientist
table.

If the discovery table has 100,000 rows, and there are 100 scientists,
then deleting those 100 scientists would require scanning 100,000 * 100
= 10M records, so this sort of thing can quickly become a very expensive
operation.

Because of this potential for truly atrocious update/delete behavior,
some database systems (SQL Server at least, and IIRC Oracle as well)
either automatically create the index on discovery.scientist_name when
the foreign key constraint is created, or refuse to create the foreign
key constraint if there isn't already an index.

PG doesn't force you to have an index, which can be desirable for
performance reasons in some situations if you know what you're doing,
but allows you to royally shoot yourself in the foot on deletes/updates
to the parent table if you're not careful.

If you have a lot of constraints and want to track down which one is
unindexed, then doing an EXPLAIN ANALYZE of deleting a single row from
the parent table will tell you how long each of the referential
integrity checks takes, so you can figure out which indexes are missing.

-- Mark Lewis

On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
> DELETE) on a trivially small table but with many foreign key references
> (on similar-sized tables), and I'm waiting for it to finish. It's been
> 10 minutes now, which seems very excessive for a table of 9000 rows on a
> 3 GHz desktop machine.
>
> 'top' says it's all spent in USER time, and there's a ~~500KB/s write
> rate going on. Just before this DELETE, I've deleted data from a larger
> table (50000 rows) using the same method and it finished in couple of
> seconds - maybe it's a PostgreSQL bug?
>
> My question is: assuming it's not a bug, how to optimize DELETEs?
> Increasing work_mem maybe?
>
> (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)
>
> (I know about TRUNCATE; I need those foreign key references to cascade)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ivan Voras 2006-09-19 20:39:42 Re: Optimizing DELETE
Previous Message Csaba Nagy 2006-09-19 14:15:29 Re: Optimizing DELETE