Skip site navigation (1) Skip section navigation (2)

Re: Very long deletion time on a 200 GB database

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very long deletion time on a 200 GB database
Date: 2012-02-23 16:37:31
Message-ID: 4F466B4B.4080702@peak6.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 02/23/2012 02:39 AM, Reuven M. Lerner wrote:

> I should note that my primary concern is available RAM. The database, as
> I wrote, is about 200 GB in size, and PostgreSQL is reporting (according
> to Windows) use of about 5 GB RAM, plus another 25 GB of virtual memory.

O_o

That... that would probably swap just constantly. No end. Just swap all 
day long. But maybe not. Please tell us the values for these settings:

* shared_buffers
* work_mem
* maintenance_work_mem
* checkpoint_segments
* checkpoint_timeout

It also wouldn't be a bad idea to see how many concurrent connections 
there are, because that may determine how much memory all the backends 
are consuming. In any case, if it's actually using 25GB of virtual 
memory, any command you run that doesn't happen to be in cache, will 
just immediately join a giant logjam.

> I've told the Windows folks on this project that virtual memory kills a
> database, and that it shouldn't surprise us to have horrible performance
> if the database and operating system are both transferring massive
> amounts of data back and forth. But there doesn't seem to be a good way
> to handle this

You kinda can, by checking those settings and sanitizing them. If 
they're out of line, or too large, they'll create the need for more 
virtual memory. Having the virtual memory there isn't necessarily bad, 
but using it is.

> DELETE FROM B
> WHERE r_id IN (SELECT R.id
> FROM R, B
> WHERE r.end_date < (NOW() - (interval '1 day' * 30))
> AND r.id = b.r_id

Just to kinda help you out syntactically, have you ever tried a DELETE 
FROM ... USING? You can also collapse your interval notation.

DELETE FROM B
  USING R
  WHERE R.id = B.r_id
    AND R.end_date < CURRENT_DATE - INTERVAL '30 days';

But besides that, the other advise you've received is sound. Since your 
select->truncate->insert attempt was also slow, I suspect you're having 
problems with foreign key checks, and updating the index trees. 
Maintaining an existing index can be multiples slower than filling an 
empty table and creating the indexes afterwards.

So far as your foreign keys, if any of the child tables don't have an 
index on the referring column, your delete performance will be 
atrocious. You also need to make sure the types of the columns are 
identical. Even a numeric/int difference will be enough to render an 
index unusable.

We have a 100GB *table* with almost 200M rows and even deleting from 
that in many of our archive tests doesn't take anywhere near 9 hours. 
But I *have* seen a delete take that long when we had a numeric primary 
key, and an integer foreign key. Even a handful of records can cause a 
nested loop sequence scan, which will vastly inflate delete time.

-- 
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

pgsql-performance by date

Next:From: Kevin GrittnerDate: 2012-02-23 16:51:25
Subject: Re: : Cost calculation for EXPLAIN output
Previous:From: Steve CrawfordDate: 2012-02-23 15:47:36
Subject: Re: Very long deletion time on a 200 GB database

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group