Re: unexpected shutdown

From: Marco Colombo <marco(at)esi(dot)it>
To: developer(at)wexwarez(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: unexpected shutdown
Date: 2007-06-20 10:21:45
Message-ID: 4678FFB9.5000605@esi.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

developer(at)wexwarez(dot)com wrote:
>> developer(at)wexwarez(dot)com writes:
>>> My database has shutdown several times in the last couple days. I have
>>> no
>>> idea why. I am running centos and I have not rebooted the server or
>>> made
>>> any configuration changes.
>> So in particular, you didn't disable memory overcommit?
>>
>>> LOG: server process (PID 501) was terminated by signal 9
>> If you didn't issue a manual kill -9, then this is almost certainly a
>> trace of the kernel OOM killer at work. Google for "OOM kill" to learn
>> more, or see "memory overcommit" in the PG docs.
>>
>> Memory overcommit is evil on a server.
>>
>> regards, tom lane
>>
>
>
> You guys were right
> :Jun 17 11:04:57 kernel: Out of Memory: Killed process 24928 (postmaster).
>
> I did not disable memory overcommit. I guess this is something I will
> have to do. I have actually never seen this before or heard of memory
> overcommit. I am surprised a setting like this comes enabled by default.
> I read a bit about it and it seems to make sense to disable it, but from
> practical experience do you know of any negative side effects?

The consensus on using overcommit_memory = 2 is far from general.

Your problem is a "java application with memory issues", so I think you
should address that directly first. Either run it elsewhere (and turn
the host running PG into a dedicated one) or fix its memory leaks or use
resource limits provided by the OS to limit the java app.

Linux kernel people aren't totally clueless about VM. If they chose to
keep overcommiting and the OOM killer enabled by default, there're reasons.

With overcommitting on, you save al lot of swap space from being
allocated, leaving it for stuff that is actually used and not just
potentially used. The overall system throughput is thus higher.

When it comes to OOM situation, with overcommitting off things aren't
much better. First, OOM happens much before than with overcommiting on.
This usually isn't perceived as a big advantage, since 95% of the cases
the OOM is caused by one runaway process, so sooner or later it will
cause OOM either way. But in a correctly administered server, with OS
limits configured, a single runaway process doesn't cause OOM. OOM may
still happen for excessive load, and I'd rather see my system handle
some high load spikes than go into OOM situation. So lowering the
threshold of what 'excessive load' is, isn't necessarily a good idea.

And OK, let's say you've hit OOM anyway. There's no win-win solution.
Having PG processes SIGKILL'd is quite bad. But sitting in front of a
keyboard watching your system die w/o being able to login (OOM, so fork
fails) isn't much better. You may be able to do something (sysrq, maybe)
but the chances you manage to run a proper shutdown are quite thin, in
the general case. So you have to choose between the risk of PG being
SIGKILL'd (but the OOM _may_ pick the right process instead) and the
risk of being forced into hitting the 'reset' button. Either way, your
precious data isn't happy at all.

So the bottom line is, avoid OOM by properly configuing OS resource
limits. If you don't, then overcommit_memory = 2 is _definitely_ better.
If you do, it's a hard call. If you think about it, the funny thing is
that the more experienced the sysadm you're talking to is, the less
experience he has about handling OOM situations. By definition. :)

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marco Colombo 2007-06-20 10:33:28 Re: unexpected shutdown
Previous Message nasim.sindri 2007-06-20 09:54:56 commit failed when calling postgresql procedure