Re: Unportable implementation of background worker start

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unportable implementation of background worker start
Date: 2017-04-21 16:50:04
Message-ID: 18193.1492793404@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a lightly-tested draft patch that converts the postmaster to
use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions
about whether this is the direction to proceed, though. It adds at least
a couple of kernel calls per postmaster signal delivery, and probably to
every postmaster connection acceptance (ServerLoop iteration), to fix
problems that are so corner-casey that we've never even known we had them
till now.

I'm looking longingly at pselect(2), ppoll(2), and epoll_pwait(2), which
would solve the problem without need for a self-pipe. But the latter two
are Linux-only, and while pselect does exist in recent POSIX editions,
it's nonetheless got portability issues. Googling suggests that on a
number of platforms, pselect is non-atomic, ie it's nothing but a
wrapper for sigprocmask/select/sigprocmask, which would mean that the
race condition I described in my previous message still exists.

Despite that, a pretty attractive proposal is to do, essentially,

#ifdef HAVE_PSELECT
selres = pselect(nSockets, &rmask, NULL, NULL, &timeout, &UnBlockSig);
#else
PG_SETMASK(&UnBlockSig);
selres = select(nSockets, &rmask, NULL, NULL, &timeout);
PG_SETMASK(&BlockSig);
#endif

This fixes the race on platforms where pselect exists and is correctly
implemented, and we're no worse off than before where that's not true.

The other component of the problem is the possibility that select() will
restart if the signal is marked SA_RESTART. (Presumably that would apply
to pselect too.) I am thinking that maybe the answer is "if it hurts,
don't do it" --- that is, in the postmaster maybe we shouldn't use
SA_RESTART, at least not for these signals.

A different line of thought is to try to provide a bulletproof solution,
but push the implementation problems down into latch.c --- that is, the
goal would be to provide a pselect-like variant of WaitEventSetWait that
is guaranteed to return if interrupted, as opposed to the current behavior
where it's guaranteed not to. But that seems like quite a bit of work.

Whether or not we decide to change over the postmaster.c code, I think
it'd likely be a good idea to apply most or all of the attached changes
in latch.c. Setting CLOEXEC on the relevant FDs is clearly a good thing,
and the other changes will provide some safety if some preloaded extension
decides to create a latch in the postmaster process.

regards, tom lane

Attachment Content-Type Size
make-postmaster-use-WaitEventSet-v1.patch text/x-diff 20.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Borodin 2017-04-21 17:15:07 Re: Range Merge Join v1
Previous Message Alvaro Herrera 2017-04-21 15:19:41 Re: Unportable implementation of background worker start