Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
Date: 2021-03-01 11:00:08
Message-ID: CA+hUKGJXYSazAYmBN4_BO8A5YiL9doT2VZr3ELaCP0RYwXzuqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote:
> > Right, RestoreArchivedFile() uses system(), so I guess it can hang
> > around for a long time after unexpected postmaster exit on every OS if
> > the command waits. To respond to various kinds of important
> > interrupts, I suppose that'd ideally use something like
> > OpenPipeStream() and a typical WaitLatch() loop with CFI(). I'm not
> > sure what our policy is or should be for exiting while we have running
> > subprocesses. I guess that is a separate issue.
>
> - if (IsUnderPostmaster && !PostmasterIsAlive())
> + if (IsUnderPostmaster &&
> +#ifndef USE_POSTMASTER_DEATH_SIGNAL
> + count++ % 1024 == 0 &&
> +#endif
> + !PostmasterIsAlive())
> That's pretty hack-ish, still efficient. Could we consider a
> different approach like something relying on
> PostmasterIsAliveInternal() with repetitive call handling? This may
> not be the only place where we care about that, particularly for
> non-core code.

As far as I know there aren't any other places that do polling of
PostmasterIsAlive() in a loop like this. All others have been removed
from core code: either they already had a WaitLatch() or similar so it
we just had to add WL_EXIT_ON_PM_DEATH, or they do pure CPU-bound and
don't actively try to detect postmaster death. That's why it seems
utterly insane that here we try to detect it X million times per
second.

> No objections with the two changes from pg_usleep() to WaitLatch() so
> they could be applied separately first.

I thought about committing that first part, and got as far as
splitting the patch into two (see attached), but then I re-read
Fujii-san's message about the speed of promotion and realised that we
really should have something like a condition variable for walRcvState
changes. I'll look into that.

Attachment Content-Type Size
v5-0001-Replace-some-sleep-poll-loops-with-WaitLatch.patch text/x-patch 4.3 KB
v5-0002-Poll-postmaster-less-frequently-in-recovery.patch text/x-patch 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-03-01 11:13:56 Re: Regex back-reference semantics and performance
Previous Message Michael Banck 2021-03-01 10:12:49 [PATCH] Add --create-only option to pg_dump/pg_dumpall