Skip site navigation (1) Skip section navigation (2)

Re: Very long deletion time on a 200 GB database

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
Cc: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>, "pgsql-performance(at)postgresql(dot)org"<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very long deletion time on a 200 GB database
Date: 2012-02-24 15:16:50
Message-ID: 4F47A9E2.3030404@peak6.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 02/24/2012 08:54 AM, Reuven M. Lerner wrote:

> I/O writes: process1: 820,000, process2: 1Milion Process3: 33,000

That's not especially helpful, unfortunately. That doesn't really tell 
us how saturated the controller is. However I suspect it's being 
effectively slammed based simply on your CPU usage.

The main problem you're going to run into is that your table is larger 
than the memory in that server. 4GB is really pretty small for a server 
hosting a 200+GB database. That they didn't mean it to get that big 
doesn't really help you clean it up.

But as a consequence, deleting from that table, or creating a temp table 
with 30 days of data, truncating the table, and re-inserting, it's still 
going to cause a lot of disk activity. Especially since the database is 
constantly writing out transaction logs. But you do have a few things on 
your side.

You say you're deleting from table B, which has no foreign keys 
referencing it. That's good. You need to go back to your truncate 
approach, and do this:

CREATE TABLE keep_b_data AS
SELECT *
   FROM B
  WHERE some_date >= CURRENT_DATE - INTERVAL '30 days';

TRUNCATE TABLE B;

DROP INDEX idx_something_on_b_1;
DROP INDEX idx_something_on_b_2;
DROP INDEX idx_something_on_b_3;

ALTER TABLE B DROP CONSTRAINT whatever_pk;

INSERT INTO B
SELECT *
   FROM keep_b_data;

ALTER TABLE B ADD CONSTRAINT whatever_pk PRIMARY KEY (some_col);

CREATE INDEX idx_something_on_b_1 ON B (col_a);
CREATE INDEX idx_something_on_b_2 ON B (col_b);
CREATE INDEX idx_something_on_b_3 ON B (col_c);

You need to make sure nothing is reading from the table while you're 
doing this, because the missing indexes will make selects increase your 
disk utilization, which you definitely don't want. Get a window to work in.

But this should be much faster than your original attempts. Inserting 
the 30-day window into table B should be just as fast as creating the 
holding table, and creating the primary key and recreating the indexes 
should take about the same amount of time each.

So to get a *rough* idea of how long it will take, do the first step, 
and create the holding table. Multiply that by the number of indexes and 
the primary key, plus 1. So if it takes 20 minutes, and you have three 
indexes, and the primary key, multiply by five.

I guess the other question is: Is PostgreSQL the only thing running on 
this server? If not, that may be the source of your disk IO, and it's 
choking the database and your ability to clean it up. Try to get them to 
temporarily disable all non-essential services while you do the cleanup. 
I'm wondering if they're also running the app on the Windows machine, 
based on your original story. That in itself isn't a very good design, 
but they're also running a PostgreSQL server on Windows, so who knows 
what they're thinking over there. :)

-- 
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Responses

pgsql-performance by date

Next:From: Steve CrawfordDate: 2012-02-24 17:20:04
Subject: Re: Very long deletion time on a 200 GB database
Previous:From: Reuven M. LernerDate: 2012-02-24 14:54:04
Subject: Re: Very long deletion time on a 200 GB database

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group