Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Marco Pfatschbacher <Marco_Pfatschbacher(at)genua(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Date: 2018-04-10 23:57:20
Message-ID: CAEepm=1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
>> Yeah, I wondered why that was different than the pattern established
>> elsewhere when I was hacking on replication code. There are actually
>> several places where we call PostmasterIsAlive() unconditionally in a
>> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
>
> Note that just changing this implies a behavioural change in at least
> some of those: If the loop is busy with work, the WaitLatch might never
> be reached. I think that might be ok in most of those, but it does
> require examination.

Rebased, but I don't actually like this patch any more. Over in
another thread[1] I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to). Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

[1] https://www.postgresql.org/message-id/CAEepm%3D2LqHzizbe7muD7-2yHUbTOoF7Q%2BqkSD5Q41kuhttRTwA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Replace-PostmasterIsAlive-calls-with-WL_POSTMASTER_D.patch application/octet-stream 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-04-10 23:57:57 Re: 2018-03 CF Cleanup
Previous Message Michael Paquier 2018-04-10 23:53:29 Re: Gotchas about pg_verify_checksums