Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Date: 2018-11-17 01:27:02
Message-ID: CAEepm=0-kK3CLhX4zPRLBzvotTvxeYO_Eiwcz0R=4HZnpNugvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 17, 2018 at 5:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > [ 0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v4.patch ]
>
> I took a quick look through this. I have no objection to the idea of
> letting the latch infrastructure do the proc_exit(1), but I'm wondering
> why this is in the thread that it's in. Is there any remaining connection
> to the original complaint about BSDen not coping well with lots of
> processes waiting on the same pipe?

It turned into a general clean-up of inconsistent postmaster death
handling after the one-pipe-per-backend proposal was rejected.
Perhaps I should have started a new thread, but it does in fact
address most cases of processes calling PostmasterIsAlive()
frequently, which was part of the original complaint. That falls into
the last-mentioned category from this paragraph of the commit message:

Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).

Other potential improvements that could be made on both sides:

* We should probably be smarter about how often we call
PostmasterIsAlive() in the recovery loop, which is the only remaining
case like that (because it's a busy loop, given enough WAL to chew on,
so there is no waiting primitive that would report pipe readiness).

* It would be nice if more kernels supported parent death signals.

* Based on this report, those kernels really should sort out their
wakeup logic for duplicated pipe fds. I wonder if other multi-process
applications have the same problem. Our logger pipe and the proposed
checkpointer fsync pipe too, maybe, not sure.

> I'd advise making the code in places like this look like
>
> (void) WaitLatch(MyLatch, ...);
>
> Otherwise, you are likely to draw complaints about "return value is
> sometimes ignored" from Coverity and other static analyzers. The
> (void) cast makes it explicit that you're intentionally ignoring
> the result value in this one place.

Done.

> Since exit_on_postmaster_death = true is going to be the normal case,
> it seems a bit unfortunate that we have to incur this looping every time
> we go through WaitEventSetWait. Can't we put the handling of this in some
> spot where it only gets executed when we've detected WL_POSTMASTER_DEATH,
> like right after the PostmasterIsAliveInternal calls?
>
> if (!PostmasterIsAliveInternal())
> {
> + if (set->exit_on_postmaster_death)
> + proc_exit(1);
> occurred_events->fd = PGINVALID_SOCKET;
> occurred_events->events = WL_POSTMASTER_DEATH;
> occurred_events++;
> returned_events++;
> }

I was probably trying to avoid repeating the code in each of the 3
implementations of that function (poll, epoll, win32). But there is
already other similar duplication there and I agree that is better.
Done that way.

Thanks for the review.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v5.patch application/octet-stream 42.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-11-17 01:47:24 Re: pg11.1 jit segv
Previous Message Justin Pryzby 2018-11-17 01:23:44 Re: pg11.1 jit segv