Re: Latch implementation that wakes on postmaster death on both win32 and Unix

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date: 2011-07-08 16:08:16
Message-ID: DB00E4A5-3CBC-489D-81EC-D9402940C6B7@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul8, 2011, at 17:56 , Peter Geoghegan wrote:
> On 8 July 2011 15:58, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> SyncRepWaitForLSN() says
>> /*
>> * Wait on latch for up to 60 seconds. This allows us to check for
>> * postmaster death regularly while waiting. Note that timeout here
>> * does not necessarily release from loop.
>> */
>> WaitLatch(&MyProc->waitLatch, 60000000L);
>>
>> I guess that 60-second timeout is unnecessary now that we'll wake up
>> on postmaster death anyway.
>
> We won't wake up on Postmaster death here, because we haven't asked to
> - not yet, anyway. We're just using the new interface here for that
> one function call in v8.

Oh, Right. I still think it'd might be worthwhile to convert it, but
but not in this patch.

>> Also, none of the callers of WaitLatch() seems to actually inspect
>> the WL_POSTMASTER_DEATH bit of the result. We might want to make
>> their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
>> bit being set. At least in the case of SyncRepWaitForLSN(), it seems
>> that avoiding the extra read() syscall might be beneficial.
>
> I don't think so. Postmaster death is an anomaly, so why bother with
> any sort of optimisation for that case? Also, that's exactly the sort
> of thing that we're trying to caution callers against doing with this
> comment:
>
> "That should be rare in practice, but the caller should not use the
> return value for anything critical, re-checking the situation with
> PostmasterIsAlive() or read() on a socket if necessary."

Uh, I phrased that badly. What I meant was doing

if ((result & WL_POSTMASTER_DEATH) && (!PostmasterIsAlive())

instead of

if (!PostmasterIsAlive)

It seems that currently SyncRepWaitForLSN() will execute
PostmasterIsAlive() after every wake up. But actually it only needs
to do that if WaitLatch() sets WL_POSTMASTER_DEATH. Usually we wouldn't
care, but in the case of SyncRepWaitForLSN() I figures that we might.
It's in the code path of COMMIT (in the case of synchronous replication)
after all...

We'd not optimize the case of a dead postmaster, but the case of
an live one. Which I do hope is the common case ;-)

> You might say that the only reason we even bother reporting postmaster
> death in the returned bitfield is because there is an expectation that
> it will report it, given that we use the same masks on wakeEvents to
> inform the function what events we'll actually be waiting on for the
> call.

I kinda guessed that to be the reason after reading the latest patch ;-)

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-07-08 16:10:23 blog post on ancient history
Previous Message Robert Haas 2011-07-08 16:01:16 Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.