Re: Memory reporting on CentOS Linux

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Jeremy Carroll <jeremy(dot)carroll(at)networkedinsights(dot)com>, Matthew Wakeling <matthew(at)flymine(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Memory reporting on CentOS Linux
Date: 2009-08-17 23:43:16
Message-ID: C6AF3924.EFFD%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 8/17/09 10:24 AM, "Jeremy Carroll" <jeremy(dot)carroll(at)networkedinsights(dot)com>
wrote:

> I believe this is exactly what is happening. I see that the TOP output lists a
> large amount ov VIRT & RES size being used, but the kernel does not report
> this memory as being reserved and instead lists it as free memory or cached.

Oh! I recall I found that fun behaviour Linux and thought it was a Postgres
bug a while back. It has lot of other bad effects on how the kernel chooses
to swap. I really should have recalled that one. Due to this behavior, I
had initially blamed postgres for "pinning" memory in shared_buffers in the
disk cache. But that symptom is one of linux thinking somehow that pages
read into shared memory are still cached (or something similar).

Basically, it thinks that there is more free memory than there is when there
is a lot of shared memory. Run a postgres instance with over 50% memory
assigned to shared_buffers and when memory pressure builds kswapd will go
NUTS in CPU use, apparently confused. With high OS 'swappiness' value it
will swap in and out too much, and with low 'swappiness' it will CPU spin,
aware on one hand that it is low on memory, but confused by the large
apparent amount free so it doesn't free up much and kswapd chews up all the
CPU and the system almost hangs. It behaves as if the logic that determines
where to get memory from for a process knows that its almost out, but the
logic that decides what to swap out thinks that there is plenty free. The
larger the ratio of shared memory to total memory in the system, the higher
the CPU use by the kernel when managing the buffer cache.

Bottom line is that Linux plus lots of SYSV shared mem doesn't work as well
as it should. Setting shared_buffers past 35% RAM doesn't work well on
Linux. Shared memory accounting is fundamentally broken in Linux (see some
other threads on how the OOM killer works WRT shared memory for other
examples).

>
> If this is indeed the case, how does one determine if a PostgreSQL instance
> requires more memory? Or how to determine if the system is using memory
> efficiently?

Just be aware that the definite memory used per process is RES-SHR, and that
the max SHR value is mostly duplicated in the 'cached' or 'free' columns.
That mas SHR value IS used by postgres, and not the OS cache.
If cached + memory free is on the order of your shared_buffers/SHR size,
you're pretty much out of memory.

Additionally, the OS will start putting things into swap before you reach
that point, so pay attention to the swap used column in top or free. That
is a more reliable indicator than anything else at the system level.

If you want to know what postgres process is using the most memory on its
own look at the DATA and CODE top columns, or calculate RES-SHR.

I have no idea if more recent Linux Kernels have fixed this at all.

>
> Thanks for the responses.
>
>
> On 8/17/09 6:03 AM, "Matthew Wakeling" <matthew(at)flymine(dot)org> wrote:
>
>> On Sat, 15 Aug 2009, Mark Mielke wrote:
>>> I vote for screwed up reporting over some PostgreSQL-specific explanation.
>>> My
>>> understanding of RSS is the same as you suggested earlier - if 50% RAM is
>>> listed as resident, then there should not be 90%+ RAM free. I cannot think
>>> of
>>> anything PostgreSQL might be doing into influencing this to be false.
>>
>> The only thing I would have thought that would allow this would be mmap.
>>
>>> Just for kicks, I tried an mmap() scenario (I do not think PostgreSQL uses
>>> mmap()), and it showed a large RSS, but it did NOT show free memory.
>>
>> More details please. What did you do, and what happened? I would have
>> thought that a large read-only mmapped file that has been read (and
>> therefore is in RAM) would be counted as VIRT and RES of the process in
>> top, but can clearly be evicted from the cache at any time, and therefore
>> would show up as buffer or cache rather than process memory in the totals.
>>
>> +1 on the idea that Linux memory reporting is incomprehensible nowadays.
>>
>> Matthew
>>
>> --
>> There once was a limerick .sig
>> that really was not very big
>> It was going quite fine
>> Till it reached the fourth line
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-08-18 02:35:28 Re: Memory reporting on CentOS Linux
Previous Message Slava Moudry 2009-08-17 20:07:18 number of rows estimation for bit-AND operation