Re: 8.4-vintage problem in postmaster.c

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.4-vintage problem in postmaster.c
Date: 2010-11-24 18:16:20
Message-ID: 4CED5674.5070800@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/15/2010 03:24 PM, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of sáb nov 13 19:07:50 -0300 2010:
>> Stefan Kaltenbrunner<stefan(at)kaltenbrunner(dot)cc> writes:
>>> On 11/13/2010 06:58 PM, Tom Lane wrote:
>>>> Just looking at it, I think that the logic in canAcceptConnections got
>>>> broken by somebody in 8.4, and then broken some more in 9.0: in some
>>>> cases it will return an "okay to proceed" status without having checked
>>>> for TOOMANY children. Was this system possibly in PM_WAIT_BACKUP or
>>>> PM_HOT_STANDBY state? What version was actually running?
>>
>>> I don't have too many details on the actual setup (working on that) but
>>> the box in question is running 8.4.2 and had no issues before the
>>> upgrade to 8.4 (ie 8.3 was reported to work fine - so a 8.4+ breakage
>>> looks plausible).
>>
>> Well, this failure would certainly involve a flood of connection
>> attempts, so it's possible it's a pre-existing bug that they just did
>> not happen to trip over before. But the sequence of events that I'm
>> thinking about is a smart shutdown attempt (SIGTERM to postmaster)
>> while an online backup is in progress, followed by a flood of
>> near-simultaneous connection attempts while the backup is still active.
>
> As far as I could gather from Stefan's description, I think this is
> pretty unlikely. It seems to me that the "too many children" error
> message is very common in the 8.3 setup already, and the only reason
> they have a problem on 8.4 is that it crashes instead.

not sure if that is true - but 8.4 crashes whereas 8.3 just (seems to)
works - the issue is still there with 8_4_STABLE...

DEBUG3 level output (last few hours - 7MB in size) is available under
http://www.kaltenbrunner.cc/files/postgresql-2010-11-24_143513.log

From looking at the code I'm not immediatly seeing what is going wrong
here but maybe somebody else has an idea.

Stefan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Radosław Smogura 2010-11-24 18:18:40 Re: [JDBC] JDBC and Binary protocol error, for some statements
Previous Message Robert Haas 2010-11-24 18:07:03 Re: final patch - plpgsql: for-in-array