Very long deletion time on a 200 GB database

From: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
To: pgsql-performance(at)postgresql(dot)org
Subject: Very long deletion time on a 200 GB database
Date: 2012-02-23 08:39:49
Message-ID: 4F45FB55.1020600@lerner.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, everyone. I'm maintaining an application that exists as a "black
box" in manufacturing plants. The system is based on Windows, .NET, and
PostgreSQL 8.3. I'm a Unix kind of guy myself, but the application
layer and system administration are being handled by other people; I'm
just the PostgreSQL guy.

Because of the nature of the application, we don't have direct control
over what happens. And it turns out that at one installation, we're
quickly running out of disk space. The database is already taking up
about 200 GB of space, and is growing by 1 GB or so a day. Switching
disks, either to a larger/faster traditional drive, or even to a SSD, is
not an option. (And yes, I know that SSDs have their own risks, but I'm
just throwing that out as one option.)

Right now, the best solution to the space problem is to delete
information associated with old records, where "old" is from at least 30
days ago. The old records are spread across a few tables, including
many large objects. (The application was written by people who were new
to PostgreSQL, and didn't realize that they could use BYTEA.)
Basically, given a foreign key B.a_id that points to table A, I want to
DELETE all in B where A's creation date is at least 30 days ago.

Unfortunately, when we implemented this simple delete, it executed
slower than molasses, taking about 9 hours to do its thing. Not only
does this seem like a really, really long time to do such deleting, but
we have only a 4-hour window in which to run this maintenance activity,
before the factory starts to use our black box again.

I've tried a few approaches so far, none of which have been hugely
successful. The fact that it takes several hours to test each theory is
obviously a bit of a pain, and so I'm curious to hear suggestions from
people here.

I should note that my primary concern is available RAM. The database,
as I wrote, is about 200 GB in size, and PostgreSQL is reporting
(according to Windows) use of about 5 GB RAM, plus another 25 GB of
virtual memory. I've told the Windows folks on this project that
virtual memory kills a database, and that it shouldn't surprise us to
have horrible performance if the database and operating system are both
transferring massive amounts of data back and forth. But there doesn't
seem to be a good way to handle this

This is basically what I'm trying to execute:

DELETE FROM B
WHERE r_id IN (SELECT R.id
FROM R, B
WHERE r.end_date < (NOW() - (interval '1 day' * 30))
AND r.id = b.r_id

(1) I tried to write this as a join, rather than a subselect. But B has
an oid column that points to large objects, and on which we have a rule
that removes the associated large object when a row in B is removed.
Doing the delete as a join resulted in "no such large object with an oid
of xxx" errors. (I'm not sure why, although it might have to do with
the rule.)

(2) I tried to grab the rows that *do* interest me, put them into a
temporary table, TRUNCATE the existing table, and then copy the rows
back. I only tested that with a 1 GB subset of the data, but that took
longer than other options.

(3) There are some foreign-key constraints on the B table. I thought
that perhaps doing a mass DELETE was queueing up all of those
constraints, and possibly using up lots of memory and/or taking a long
time to execute. I thus rewrote my queries such that they first removed
the constraints, then executed the DELETE, and then restored the
constraints. That didn't seem to improve things much either, and took a
long time (30 minutes) just to remove the constraints. I expected
re-adding the constraints to take a while, but shouldn't removing them
be relatively quick?

(4) I tried "chunking" the deletes, such that instead of trying to
delete all of the records from the B table, I would instead delete just
those associated with 100 or 200 rows from the R table. On a 1 GB
subset of the data, this seemed to work just fine. But on the actual
database, it was still far too slow.

I've been surprised by the time it takes to delete the records in
question. I keep trying to tell the others on this project that
PostgreSQL isn't inherently slow, but that a 200 GB database running on
a non-dedicated machine, with an old version (8.3), and while it's
swapping RAM, will be slow regardless of the database software we're
using. But even so, 9 hours to delete 100 GB of data strikes me as a
very long process.

Again, I continue to believe that given our hard time deadlines, and the
fact that we're using a large amount of virtual memory, that there isn't
really a solution that will work quickly and easily. But I'd be
delighted to be wrong, and welcome any and all comments and suggestions
for how to deal with this.

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Glyn Astill 2012-02-23 09:50:20 Re: Very long deletion time on a 200 GB database
Previous Message Alessandro Gagliardi 2012-02-22 23:50:57 set autovacuum=off