Re: System overload / context switching / oom, 8.3

From: Rob Lemley <rclemley(at)yahoo(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: System overload / context switching / oom, 8.3
Date: 2010-02-04 01:27:48
Message-ID: 4B6A2294.7040107@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andy Colson wrote:
> > work_mem = 32MB
> > maintenance_work_mem = 64MB
>
>
> if you have lots and lots of connections, you might need to cut these
> down?

definitely, work_mem is the main focus.

If I understand correctly, th 64MB maintenance_work_mem is per vacuum
task, and on this system there are 3 autovacuums. I was wondering if
with this many databases, possibly decreasing the maintenance_work_mem
significantly and starting up more autovacuums.

Yes, also moving databases to other servers in order to decrease the
number of connections.
>
> > effective_cache_size = 5000MB
>
> I see your running a 32bit, but with bigmem support, but still, one
> process is limited to 4gig. You'd make better use of all that ram if
> you switched to 64bit. And this cache, I think, would be limited to
> 4gig.
All of the cache is being used because the operating system kernel is
built with the memory extensions to access outside the 32bit range.
This is the cache size reported by free(1). However, there may be
advantages to switch to 64bit.

>
> The oom-killer is kicking in, at some point, so somebody is using too
> much ram. There should be messages or logs or something, right?
> (I've never enabled the oom stuff so dont know much about it). But
> the log messages might be helpful.
>
> Also, do you know what the oom max memory usage is set to? You said:
> "oom_adj -17. vm_overcommit_memory set to 2, but at this time
> vm_overcommit_ratio was still at 50 (has since been changed to 90,
> should this be 100?)"

Oh man. I encourage everyone to find out what /proc/<pid>/oom_adj
means. You have to set this to keep the Linux "oom-killer" from doing a
kill -9 on postgres postmaster. On Debian:

echo -17 >> /proc/$(cat /var/run/postgresql/8.3-main.pid)/oom_adj

This is my experience with oom-killer. After putting -17 into
/proc/pid/oom_adj, oom-killer seemed to kill one of the database
connection processes. Then the postmaster attempted to shut down all
processes because of possible shared memory corruption. The database
then went into recovery mode. After stopping the database some of the
processes were stuck and could not be killed. The operating system was
rebooted and the database returned with no data loss.

My earlier experience with oom-killer: If you don't have this setting in
oom_adj, then it seems likely (certain?) that oom-killer kills the
postmaster because of the algorithm oom-killer uses (called badness())
which adds children process scores to their parent's scores. I don't
know if sshd was killed but I don't think anyone could log in to the
OS. After rebooting there was a segmentation violation when trying to
start the postmaster. I don't think that running pg_resetxlog with
defaults is a good idea. My colleague who has been investigating the
crash believes that we could have probably eliminated at least some of
the data loss with more judicious use of pg_resetxlog.

There was a discussion on the postgres lists about somehow having the
postgres distribution include the functionality to set oom_adj on
startup. To my knowledge, that's not in 8.3 so I wrote a script and
init.d script to do this on Debian systems.

As far as vm.over_commit memory goes, there are three settings and most
recommend setting it to 2 for postgres. However, this does not turn off
oom-killer! You need to put -17 in /proc/<pid>/oom_adj whether you do
anything about vm.over_commit memory or not We had vm_overcommit_memory
set to 2 and oom-killer became active and killed the postmaster.

Kind of off-topic, but a Linux kernel parameter that's often not set on
database servers is elevator=deadline which sets up the io scheduling
algorithm. The algorithm can be viewed/set at runtime for example the
disk /dev/sdc in /sys/block/sdc/queue/scheduler.

Rob

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-02-04 02:17:51 Re: System overload / context switching / oom, 8.3
Previous Message Greg Smith 2010-02-03 21:59:31 Re: Optimizing Postgresql server and FreeBSD for heavy read and writes