Re: Very long deletion time on a 200 GB database

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very long deletion time on a 200 GB database
Date: 2012-02-23 16:37:31
Message-ID: 4F466B4B.4080702@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/23/2012 02:39 AM, Reuven M. Lerner wrote:

> I should note that my primary concern is available RAM. The database, as
> I wrote, is about 200 GB in size, and PostgreSQL is reporting (according
> to Windows) use of about 5 GB RAM, plus another 25 GB of virtual memory.

O_o

That... that would probably swap just constantly. No end. Just swap all
day long. But maybe not. Please tell us the values for these settings:

* shared_buffers
* work_mem
* maintenance_work_mem
* checkpoint_segments
* checkpoint_timeout

It also wouldn't be a bad idea to see how many concurrent connections
there are, because that may determine how much memory all the backends
are consuming. In any case, if it's actually using 25GB of virtual
memory, any command you run that doesn't happen to be in cache, will
just immediately join a giant logjam.

> I've told the Windows folks on this project that virtual memory kills a
> database, and that it shouldn't surprise us to have horrible performance
> if the database and operating system are both transferring massive
> amounts of data back and forth. But there doesn't seem to be a good way
> to handle this

You kinda can, by checking those settings and sanitizing them. If
they're out of line, or too large, they'll create the need for more
virtual memory. Having the virtual memory there isn't necessarily bad,
but using it is.

> DELETE FROM B
> WHERE r_id IN (SELECT R.id
> FROM R, B
> WHERE r.end_date < (NOW() - (interval '1 day' * 30))
> AND r.id = b.r_id

Just to kinda help you out syntactically, have you ever tried a DELETE
FROM ... USING? You can also collapse your interval notation.

DELETE FROM B
USING R
WHERE R.id = B.r_id
AND R.end_date < CURRENT_DATE - INTERVAL '30 days';

But besides that, the other advise you've received is sound. Since your
select->truncate->insert attempt was also slow, I suspect you're having
problems with foreign key checks, and updating the index trees.
Maintaining an existing index can be multiples slower than filling an
empty table and creating the indexes afterwards.

So far as your foreign keys, if any of the child tables don't have an
index on the referring column, your delete performance will be
atrocious. You also need to make sure the types of the columns are
identical. Even a numeric/int difference will be enough to render an
index unusable.

We have a 100GB *table* with almost 200M rows and even deleting from
that in many of our archive tests doesn't take anywhere near 9 hours.
But I *have* seen a delete take that long when we had a numeric primary
key, and an integer foreign key. Even a handful of records can cause a
nested loop sequence scan, which will vastly inflate delete time.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-02-23 16:51:25 Re: : Cost calculation for EXPLAIN output
Previous Message Steve Crawford 2012-02-23 15:47:36 Re: Very long deletion time on a 200 GB database