Re: Getting OOM errors from PostgreSQL

From: Chris Lajoie <chris(at)etriptrader(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Getting OOM errors from PostgreSQL
Date: 2016-03-07 18:30:35
Message-ID: 56DDC8CB.8050204@etriptrader.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 03/05/2016 09:12 AM, Kevin Grittner wrote:
> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris(at)etriptrader(dot)com> wrote:
>> In my log I have been getting errors that look like this:
>> LOG: could not fork new process for connection: Cannot allocate memory
>>
>> This seems to only happen during periods of high activity.
>> Shortly after the last one occurred I checked the free mem and got
>> this:
>> # free -h
>> total used free shared buff/cache available
>> Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G
>> Swap: 2.0G 8.3M 2.0G
>>
>> From this I see the vast majority of the 10GB of memory is being
>> used by the OS cache. If I understand how it works correctly, this
>> means it is basically free memory because if an application needs
>> it the kernel will free some of it for use... is my understanding
>> correct?
> That depends. In general, this report is light on useful detail;
> you might want to review this page and fill in some of the missing
> info, like OS version:
>
> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
Appologies for not providing more useful info. I will endeavor to do so
in the future.
> The most important missing information can be captured by running
> these shell statements:
>
> cat /proc/sys/vm/overcommit_memory
> cat /proc/sys/vm/overcommit_ratio
>
> If overcommit_memory is 2, you will get "out of memory" when an
> allocation would push allocated memory past swap space (2GB in your
> case) plus the percentage of machine RAM indicated by
> overcommit_ratio. Effectively, the difference between
> overcommit_ratio and 100 is the percentage of RAM you are reserving
> for the OS to keep things memory-resident (like buffers and cache).
> A value of 20, for example, would mean that an allocation that
> would reduce OS memory-resident pages below 80% of machine RAM will
> trigger an "out of memory" error; so when you set
> vm.overcommit_memory = 2 (which is generally a very good thing to
> do when running PostgreSQL on the machine) you should raise
> vm.overcommit_ratio = 80 or 90. You don't want to totally blow
> away OS cache, but you probably want to survive other demands on
> RAM that bring it down quite a bit.
Despite not having provided more information you seem to have gotten to
the crux of my problem anyway. As I was a little desperate to resolve
the problem, I ended up changing various things in an attempt to just
"make it go away" for now. I set overcommit_memory back to 0 and
shared_buffers = 500MB. I believe this has the effect of just allowing
the OS to cache things as it sees fit, and only reserving 500MB for
postgres to explicitly cache whatever it wants. But I did not have any
more OOM errors over the weekend.

Your explanation of how overcommit_ratio affects what the OS will allow
when overcommit_memory=2 is enlightening, but I am still not clear on
some specifics. For example I thought postgres allocates the shared
buffers on startup. If this is the case, why did I get OOM errors on
connections? Was that particular connection simply pushing the memory
past the amount established by the overcommit_ratio? overcommit_ratio
was at its default (which is apparently 50 looking at
/proc/sys/vm/overcommit_ratio).

At this point I am not sure if I should change anything... I have seen
conflicting advice at various places on the internet. Some suggest
(assuming this is a dedicated database system) to use up a large chunk
of RAM with shared_buffers, leaving enough for work_mem*max_connections
+ some extra. Others suggest allowing the OS to cache what it wants and
set shared_buffers to a smaller amount, allowing postgres to permanently
cache only the most frequently accessed smaller tables.

I would greatly appreciate your advice on this.
I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).

Thank you,
Chris

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message lis_ba 2016-03-07 21:21:42 Restringir acesso ao pg_catalog
Previous Message Stephen Kuntz 2016-03-07 18:23:17 Re: WAL replay asking for very old WAL