How to increase row deletion efficiency?

From: Alexander Stanier <alexander(dot)stanier(at)egsgroup(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: How to increase row deletion efficiency?
Date: 2005-12-07 14:36:16
Message-ID: 4396F360.6080000@egsgroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I am currently trying to separate two environments contained in one
database. Essentially I need to break that one database down into two
with a portion of the data going to each new database. I am intending to
achieve this by duplicating the database and then stripping out the data
that is not required in each database. I have started by trying to
delete data from a set of 28 related tables, however the performance
appears to be terrible. I am deleting from a table called document which
cascades down to 27 tables underneath it linked by various cascading
foreign key constraints. Some of these subsidiary tables have as many as
a couple of million records.

Before executing the delete statement from document I tried setting all
constraints as deferred within a transaction, but this does not seem to
have helped.

I can't work out whether the indexes on these tables are a help or a
hindrance. Presumably, any involving the foreign keys should help as
long as PostgreSQL will actually use them, but given that large numbers
of records are being deleted the query planner may decide just to do a
sequence scan. An EXPLAIN doesn't show me what it does past the delete
from document, i.e. if indexes are used when cascading. The downside of
the indexes is that they have to be maintained which could be a lot of
work in large scale deletions.

What I fear is that for every row that is deleted from the document
table, the database is visiting all subsidiary tables to delete all data
related to that one row before returning to document to delete another
row. this would mean that all tables are being visited many times. If
this is the way it is working, then the large tables are going to be a
real problem. The most efficient way to do it would be to delete all
document records, then with that list of documents in mind go on to the
next table and delete all related records so that each table is only
visited once to delete all the relevant records. I was hoping that
setting constraints deferred would achieve this.

Can anyone advise me on how PostgreSQL (v8.0.3 on MacOS X 10.3) works in
a delete statement and what strategy it uses to remove the data?
Can I specify "Unrecoverable" so that it doesn't write redo?
Are they any indicators I can use to tell me what part of the delete is
taking so much time?
Also can anyone suggest anything else I can do to speed things up?

Or perhaps it simply is a lot of work and there is no way round it. My
fallback option is to SELECT data that I do need rather than DELETE the
data that I don't, but this route means I cannot make use of the foreign
keys.

Regards,
Alex Stanier.

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2005-12-07 15:11:56 Re: How to increase row deletion efficiency?
Previous Message Michael Burke 2005-12-07 14:11:03 Re: Database with "override" tables