Re: Linux server connection process consumes all memory

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org, ahodgson(at)simkin(dot)ca
Subject: Re: Linux server connection process consumes all memory
Date: 2011-12-07 09:24:38
Message-ID: 4EDF30D6.20705@anatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On 06/12/2011 17:10, Tom Lane wrote:
> Merlin Moncure<mmoncure(at)gmail(dot)com> writes:
>> *) You may want to consider changing your vm over commit settings
>> and/or reducing swap in order to get your server to more aggressively
>> return OOM to postgres memory allocation. The specific error returned
>> to postgres for an OOM of course would be very helpful.
> Yeah. I would try starting the postmaster under smaller ulimit settings
> so that the kernel gives it ENOMEM before you start getting swapped.
> When that happens, the backend will dump a memory map into the
> postmaster log that would be very useful for seeing what is actually
> happening here.
>
> regards, tom lane
>
Hello all,

I think I have solved the problem. Many thanks for the support and the
time you spend. The solution/bug/problem is as follows:

1. There was one connection that as I described was used IN A LOOP
22million times. This connection was assigned a PID x (on the linux server)
2. Nested within this LOOP there was another connection that had been
forgotten from past code and the linux server was assigning to it a PID y
3. PID y was of course called also 22million times (since it was in the
loop). However it had a nasty bug and it was creating constantly
prepared commands! (opps my mistake). So PID y was creating 22million
prepared commands!
4. As I had no clue that that there was at all PID y, monitoring the TOP
on the server I was presented with the misbehaving PID y but I was of
the impression that it was PID x. In fact PID x was below in the list
happy doing its own job.

So the healthy PID X had a top signature as follows (please note the
difference between RES and SHR as well as the magnitude in Mb as Merlin
suggested):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30475 postgres 20 0 2187m 746m 741m S 31 9.5 0:41.48 postgres

While the unhealthy PID Y had a TOP signature (please note that RES
memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres

As I said I had no clue about the existence of PID Y and since it was
coming top at the TOP list I had wrongfully assumed that it was the PID
X. It gets more complicated by the fact that the test code I sent you,
which should have been working fine as it had no nested buggy loop, was
mainly running from home over the DSL line thus I never let it conclude
its 22million iterations (it would have been still running!) instead I
was monitoring the TOP and since the memory was going UP I was
wrongfully assuming that I had the same issue (if I had let it run for 2
-3 hours I would have noticed what Merlin suggested about RES/SHR
ratio). So it was a misdiagnosis after all :)

I hope this explains everything.
Kind Regards and sorry for the misunderstanding.
Yiannis

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Pandu Poluan 2011-12-07 11:23:56 Re: Linux server connection process consumes all memory
Previous Message Pandu Poluan 2011-12-07 09:18:17 Re: How to add description for databases and tables?