On 3/26/10 4:57 PM, Richard Yen wrote:
> Hi everyone,
> We've recently encountered some swapping issues on our CentOS 64GB Nehalem machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set shared_buffers to 40GB. I was wondering if anyone would have any insight into why the swapping suddenly starts, but never recovers?
> <img src="http://richyen.com/i/swap.png">
> Note, the machine has been up and running since mid-December 2009. It was only a March 8 that this swapping began, and it's never recovered.
> If we look at dstat, we find the following:
> <img src="http://richyen.com/i/dstat.png">
> Note that it is constantly paging in, but never paging out.
This happens when you have too many processes using too much space to fit in real memory, but none of them are changing their memory image. If the system swaps a process in, but that process doesn't change anything in memory, then there are no dirty pages and the kernel can just kick the process out of memory without writing anything back to the swap disk -- the data in the swap are still valid.
It's a classic problem when processes are running round-robin. Say you have space for 100 processes, but you're running 101 process. When you get to the #101, #1 is the oldest so it swaps out. Then #1 runs, and #2 is the oldest, so it gets kicked out. Then #2 runs and kicks out #3 ... and so forth. Going from 100 to 101 process brings the system nearly to a halt.
Some operating systems try to use tricks to keep this from happening, but it's a hard problem to solve.
In response to
pgsql-performance by date
|Next:||From: Gnanakumar||Date: 2010-03-27 13:00:09|
|Subject: Database size growing over time and leads to performance impact|
|Previous:||From: Scott Carey||Date: 2010-03-27 00:25:34|
|Subject: Re: why does swap not recover?|