Skip site navigation (1) Skip section navigation (2)

Re: Very long deletion time on a 200 GB database

From: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
To: sthomas(at)peak6(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very long deletion time on a 200 GB database
Date: 2012-02-27 14:59:57
Message-ID: 4F4B9A6D.1060102@lerner.co.il (view raw or flat)
Thread:
Lists: pgsql-performance
Hi, Shaun.  You wrote:
>
>> In the end, it was agreed that we could execute the deletes over
>> time, deleting items in the background, or in parallel with the
>> application's work. After all, if the disk is filling up at the rate
>> of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy
>> to do), we should be fine.
>
> Please tell me you understand deleting rows from a PostgreSQL database
> doesn't work like this. :) The MVCC storage system means you'll
> basically just be marking all those deleted rows as reusable, so your
> database will stop growing, but you'll eventually want to purge all the
> accumulated dead rows.

Oh, I understand that all right.  I've had many, *many* conversations 
with this company explaining MVCC.  It doesn't seem to work; when they 
refer to "vacuuming the database," I remind them that we have autovacuum 
working, to which they respond, "Oh, we mean VACUUM FULL."  At which 
point I remind them that VACUUM FULL is almost certainly not what they 
want to do, and then they say, "Yes, we know, but we still like to do it 
every so often."

 From what I understand, the issue isn't one of current disk space, but 
rather of how quickly the disk space is being used up.  Maybe they want 
to reclaim disk space, but it's more crucial to stop the rate at which 
disk space is being taken.  If we were to delete all of the existing 
rows, and let vacuum mark them as dead and available for reuse, then 
that would probably be just fine.

I wouldn't be surprised if we end up doing a CLUSTER at some point.  The 
problem is basically that this machine is in 24/7 operation at 
high-speed manufacturing plants, and the best-case scenario is for a 
4-hour maintenance window.  I've suggested that we might be able to help 
the situation somewhat by attaching a portable USB-based hard disk, and 
adding a new tablespace that'll let us keep running while we divide up 
the work that the disk is doing, but they've made it clear that the 
current hardware configuration cannot and will not change.  Period.

So for now, we'll just try to DELETE faster than we INSERT, and combined 
with autovacuum, I'm hoping that this crisis will be averted.  That 
said, the current state of affairs with these machines is pretty 
fragile, and I think that we might want to head off such problems in the 
future, rather than be surprised by them.

Reuven

-- 
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

In response to

Responses

pgsql-performance by date

Next:From: Shaun ThomasDate: 2012-02-27 15:01:13
Subject: Re: Very long deletion time on a 200 GB database
Previous:From: Andrew DunstanDate: 2012-02-27 14:53:59
Subject: Re: Very long deletion time on a 200 GB database

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group