Skip site navigation (1) Skip section navigation (2)

Re: Linux server connection process consumes all memory

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: Pandu Poluan <pandu(at)poluan(dot)info>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ahodgson(at)simkin(dot)ca, pgsql-novice(at)postgresql(dot)org, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: Linux server connection process consumes all memory
Date: 2011-12-07 12:14:21
Message-ID: 4EDF589D.7060003@anatec.com (view raw or flat)
Thread:
Lists: pgsql-novice
On 07/12/2011 11:23, Pandu Poluan wrote:
>
>
> On Dec 7, 2011 4:26 PM, "Ioannis Anagnostopoulos" <ioannis(at)anatec(dot)com 
> <mailto:ioannis(at)anatec(dot)com>> wrote:
> >
> > On 06/12/2011 17:10, Tom Lane wrote:
> >>
> >> Merlin Moncure<mmoncure(at)gmail(dot)com <mailto:mmoncure(at)gmail(dot)com>>  writes:
> >>>
> >>> *) You may want to consider changing your vm over commit settings
> >>> and/or reducing swap in order to get your server to more aggressively
> >>> return OOM to postgres memory allocation.  The specific error returned
> >>> to postgres for an OOM of course would be very helpful.
> >>
> >> Yeah.  I would try starting the postmaster under smaller ulimit 
> settings
> >> so that the kernel gives it ENOMEM before you start getting swapped.
> >> When that happens, the backend will dump a memory map into the
> >> postmaster log that would be very useful for seeing what is actually
> >> happening here.
> >>
> >>                        regards, tom lane
> >>
> > Hello all,
> >
> > I think I have solved the problem. Many thanks for the support and 
> the time you spend. The solution/bug/problem is as follows:
> >
> > 1. There was one connection that as I described was used IN A LOOP 
> 22million times. This connection was assigned a PID x (on the linux 
> server)
> > 2. Nested within this LOOP there was another connection that had 
> been forgotten from past code and the linux server was assigning to it 
> a PID y
> > 3. PID y was of course called also 22million times (since it was in 
> the loop). However it had a nasty bug and it was creating constantly 
> prepared commands! (opps my mistake). So PID y was creating 22million 
> prepared commands!
> > 4. As I had no clue that that there was at all PID y, monitoring the 
> TOP on the server I was presented with the misbehaving PID y but I was 
> of the impression that it was PID x. In fact PID x was below in the 
> list happy doing its own job.
> >
> > So the healthy PID X had a top signature as follows (please note the 
> difference between RES and SHR as well as the magnitude in Mb as 
> Merlin suggested):
> >
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
> > 30475 postgres  20   0 2187m 746m 741m S   31  9.5   0:41.48 postgres
> >
> > While the unhealthy PID Y had a TOP signature (please note that RES 
> memory is at 12.9g! and SHR 1.4g as well as the magnitude in Gb!):
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 15965 postgres 20 0 12.9g 6.4g 1.4g S 11 83.4 13:59.15 postgres
> >
> > As I said I had no clue about the existence of PID Y and since it 
> was coming top at the TOP list I had wrongfully assumed that it was 
> the PID X. It gets more complicated by the fact that the test code I 
> sent you, which should have been working fine as it had no nested 
> buggy loop, was mainly running from home over the DSL line thus I 
> never let it conclude its 22million iterations (it would have been 
> still running!) instead I was monitoring the TOP and since the memory 
> was going UP I was wrongfully assuming that I had the same issue (if I 
> had let it run for 2 -3 hours I would have noticed what Merlin 
> suggested about RES/SHR ratio). So it was a misdiagnosis after all :)
> >
> > I hope this explains everything.
> > Kind Regards and sorry for the misunderstanding.
>
> All's well that ends well.
>
> May I ask how you finally found the culprit? Through profiling or code 
> review?
>
> That said, I can't help admiring that PostgreSQL still survives 22 
> million prepared commands without committing suicide. Yes, it's 
> severely impacted, but still survives.
>
> Rgds,
>

When I compared the "demo program" (the one I though it was not working 
as I was not letting it run for long enough) and the actual code base I 
realised that the actual code base was running much slower than the demo 
even though they were, to my mind doing the same things. It took me then 
a few "step into" to find somewhere deep the nasty .prepare() which was 
placed in a function that once was declared static and now it was called 
for every object created (22million of them). In fact it is a little bit 
more complicated as I was in the middle of a major re factoring reducing 
the 22million iteration to 55000 by grouping and creating a few prepared 
instead of executing 22million times a command. So what I am not sure 
still is what will happen if I execute 22million times non-prepared 
statements, will it crash or will it survive since I removed the buggy 
"prepared" that were nested.

Regards

In response to

Responses

pgsql-novice by date

Next:From: Gene PooleDate: 2011-12-07 14:13:10
Subject: Convert / Migrate From Oracle 11gR2 To PostgreSQL ? On CentOS 5.7 x86_64
Previous:From: Pandu PoluanDate: 2011-12-07 11:23:56
Subject: Re: Linux server connection process consumes all memory

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group