Re: Latch implementation that wakes on postmaster death on both win32 and Unix

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date: 2011-07-08 14:58:11
Message-ID: 1D4E5807-8E49-4441-9E80-210E71A91501@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul8, 2011, at 14:40 , Heikki Linnakangas wrote:
> On 08.07.2011 13:58, Florian Pflug wrote:
>> On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:
>>> On 7 July 2011 19:15, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>>> I'm not concerned about the possibility of spurious extra cycles of
>>>>> auxiliary process event loops - should I be?
>>>>
>>>> A tight loop would be bad, but an occasional spurious wake-up seems harmless.
>>>
>>> We should also assert !PostmasterIsAlive() from within the latch code
>>> after waking due to apparent Postmaster death. The reason that I don't
>>> want to follow Florian's suggestion to check it in production is that
>>> I don't know what to do if the postmaster turns out to be alive. Why
>>> is it more reasonable to try again than to just return?
>>
>> I'd say return, but don't indicate postmaster death in the return value
>> if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
>> WaitLatch(), and return indicating postmaster death whenever select()
>> says so, and put the burden of re-checking on the callers.
>
> I put the burden on the callers. Removing the return value from WaitLatch() altogether just makes life unnecessarily difficult for callers that could safely use that information, even if you sometimes get spurious wakeups. In particular, the coding in pgarch.c is nicer if you can simply check the return code for WL_TIMEOUT, rather than call time(NULL) to figure out if the timeout was reached.
>
> Attached is a new version of this patch. PostmasterIsAlive() now uses read() on the pipe instead of kill().

I did notice a few (very minor) loose ends...

SyncRepWaitForLSN() says
/*
* Wait on latch for up to 60 seconds. This allows us to check for
* postmaster death regularly while waiting. Note that timeout here
* does not necessarily release from loop.
*/
WaitLatch(&MyProc->waitLatch, 60000000L);

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

Maybe these cleanups would better be done in a separate patch, though...

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stuart Bishop 2011-07-08 15:13:17 Re: storing TZ along timestamps
Previous Message Heikki Linnakangas 2011-07-08 14:57:07 Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.