Re: Very long deletion time on a 200 GB database

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
Cc: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very long deletion time on a 200 GB database
Date: 2012-02-24 15:16:50
Message-ID: 4F47A9E2.3030404@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/24/2012 08:54 AM, Reuven M. Lerner wrote:

> I/O writes: process1: 820,000, process2: 1Milion Process3: 33,000

That's not especially helpful, unfortunately. That doesn't really tell
us how saturated the controller is. However I suspect it's being
effectively slammed based simply on your CPU usage.

The main problem you're going to run into is that your table is larger
than the memory in that server. 4GB is really pretty small for a server
hosting a 200+GB database. That they didn't mean it to get that big
doesn't really help you clean it up.

But as a consequence, deleting from that table, or creating a temp table
with 30 days of data, truncating the table, and re-inserting, it's still
going to cause a lot of disk activity. Especially since the database is
constantly writing out transaction logs. But you do have a few things on
your side.

You say you're deleting from table B, which has no foreign keys
referencing it. That's good. You need to go back to your truncate
approach, and do this:

CREATE TABLE keep_b_data AS
SELECT *
FROM B
WHERE some_date >= CURRENT_DATE - INTERVAL '30 days';

TRUNCATE TABLE B;

DROP INDEX idx_something_on_b_1;
DROP INDEX idx_something_on_b_2;
DROP INDEX idx_something_on_b_3;

ALTER TABLE B DROP CONSTRAINT whatever_pk;

INSERT INTO B
SELECT *
FROM keep_b_data;

ALTER TABLE B ADD CONSTRAINT whatever_pk PRIMARY KEY (some_col);

CREATE INDEX idx_something_on_b_1 ON B (col_a);
CREATE INDEX idx_something_on_b_2 ON B (col_b);
CREATE INDEX idx_something_on_b_3 ON B (col_c);

You need to make sure nothing is reading from the table while you're
doing this, because the missing indexes will make selects increase your
disk utilization, which you definitely don't want. Get a window to work in.

But this should be much faster than your original attempts. Inserting
the 30-day window into table B should be just as fast as creating the
holding table, and creating the primary key and recreating the indexes
should take about the same amount of time each.

So to get a *rough* idea of how long it will take, do the first step,
and create the holding table. Multiply that by the number of indexes and
the primary key, plus 1. So if it takes 20 minutes, and you have three
indexes, and the primary key, multiply by five.

I guess the other question is: Is PostgreSQL the only thing running on
this server? If not, that may be the source of your disk IO, and it's
choking the database and your ability to clean it up. Try to get them to
temporarily disable all non-essential services while you do the cleanup.
I'm wondering if they're also running the app on the Windows machine,
based on your original story. That in itself isn't a very good design,
but they're also running a PostgreSQL server on Windows, so who knows
what they're thinking over there. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Crawford 2012-02-24 17:20:04 Re: Very long deletion time on a 200 GB database
Previous Message Reuven M. Lerner 2012-02-24 14:54:04 Re: Very long deletion time on a 200 GB database