Re: Excessive PostmasterIsAlive calls slow down WAL redo

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Excessive PostmasterIsAlive calls slow down WAL redo
Date: 2018-04-10 00:53:30
Message-ID: 20180410005330.rqjjpjyw3c265j2n@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-04-05 12:20:38 -0700, Andres Freund wrote:
> > While it's not POSIX, at least some platforms are capable of delivering
> > a separate signal on parent process death. Perhaps using that where
> > available would be enough of an answer.
>
> Yea, that'd work on linux. Which is probably the platform 80-95% of
> performance critical PG workloads run on. There's
> JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
> but I'm not sure it provides enough opportunity for cleanup.

I coincidentally got pinged about our current approach causing
performance problems on FreeBSD and started writing a patch. The
problem there appears to be that constantly attaching events to the read
pipe end, from multiple processes, causes significant contention inside
the kernel. Which isn't that surprising. That's distinct from the
problem netbsd/openbsd reported a while back (superflous wakeups).

That person said he'd work on adding an equivalent of linux'
prctl(PR_SET_PDEATHSIG) to FreeBSD.

It's not particularly hard to whip something up that kind of
works. Getting some of the details right isn't entirely as clear
however:

It's trivial to make PostmasterIsAlive() cheap. In my prototype I've
setup PR_SET_PDEATHSIG to deliver SIGINFO to postmaster children. That
sets parent_potentially_died to true. PostmasterIsAlive() only runs the
main body when it's true, making it very cheap in the common case.

What I'm however not quite as clear on is how to best change
WL_POSTMASTER_DEATH.

One appraoch approach I can come up with is to use the self-pipe for
both WL_POSTMASTER_DEATH and WL_LATCH_SET. As we can cheaply check if
either set->latch->is_set or parent_potentially_died, re-using the
selfpipe works well enough. What I'm however not quite sure about is
how to best do so with epoll() - there we explicitly do not iterate over
all registered events, but use epoll_event->data to get just the wait
event associated with a readyness event. Therefor we've to keep track
of whether a WaitEventSet has both WL_POSTMASTER_DEATH and WL_LATCH_SET
registered, and check in both WL_POSTMASTER_DEATH/WL_LATCH_SET whether
the event is being waited for.

Another approach, that's simpler to implement, is to simply have a
second selfpipe, just for WL_POSTMASTER_DEATH.

A third question is whether we want to keep postmaster_alive_fds if we
have PR_SET_PDEATHSIG. I'd want to continue re-checking that the parent
is actually dead, but we could also do so by kill()ing postmaster like
we used to do before 89fd72cbf26f5d2e3d86ab19c1ead73ab8fac0fe. It's not
bad to get rid of unnecessary filedescriptors. Would imply a semantic
change however.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-10 01:31:07 Re: Excessive PostmasterIsAlive calls slow down WAL redo
Previous Message Andreas Karlsson 2018-04-10 00:41:10 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS