Skip site navigation (1) Skip section navigation (2)

Re: Unportable implementation of background worker start

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unportable implementation of background worker start
Date: 2017-04-21 16:50:04
Message-ID: 18193.1492793404@sss.pgh.pa.us (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
Attached is a lightly-tested draft patch that converts the postmaster to
use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
about whether this is the direction to proceed, though.  It adds at least
a couple of kernel calls per postmaster signal delivery, and probably to
every postmaster connection acceptance (ServerLoop iteration), to fix
problems that are so corner-casey that we've never even known we had them
till now.

I'm looking longingly at pselect(2), ppoll(2), and epoll_pwait(2), which
would solve the problem without need for a self-pipe.  But the latter two
are Linux-only, and while pselect does exist in recent POSIX editions,
it's nonetheless got portability issues.  Googling suggests that on a
number of platforms, pselect is non-atomic, ie it's nothing but a
wrapper for sigprocmask/select/sigprocmask, which would mean that the
race condition I described in my previous message still exists.

Despite that, a pretty attractive proposal is to do, essentially,

#ifdef HAVE_PSELECT
        selres = pselect(nSockets, &rmask, NULL, NULL, &timeout, &UnBlockSig);
#else
        PG_SETMASK(&UnBlockSig);
        selres = select(nSockets, &rmask, NULL, NULL, &timeout);
        PG_SETMASK(&BlockSig);
#endif

This fixes the race on platforms where pselect exists and is correctly
implemented, and we're no worse off than before where that's not true.

The other component of the problem is the possibility that select() will
restart if the signal is marked SA_RESTART.  (Presumably that would apply
to pselect too.)  I am thinking that maybe the answer is "if it hurts,
don't do it" --- that is, in the postmaster maybe we shouldn't use
SA_RESTART, at least not for these signals.

A different line of thought is to try to provide a bulletproof solution,
but push the implementation problems down into latch.c --- that is, the
goal would be to provide a pselect-like variant of WaitEventSetWait that
is guaranteed to return if interrupted, as opposed to the current behavior
where it's guaranteed not to.  But that seems like quite a bit of work.

Whether or not we decide to change over the postmaster.c code, I think
it'd likely be a good idea to apply most or all of the attached changes
in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
and the other changes will provide some safety if some preloaded extension
decides to create a latch in the postmaster process.

			regards, tom lane


Attachment: make-postmaster-use-WaitEventSet-v1.patch
Description: text/x-diff (20.0 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Andrew BorodinDate: 2017-04-21 17:15:07
Subject: Re: Range Merge Join v1
Previous:From: Alvaro HerreraDate: 2017-04-21 15:19:41
Subject: Re: Unportable implementation of background worker start

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group