|From:||Marco Pfatschbacher <Marco_Pfatschbacher(at)genua(dot)de>|
|Subject:||PATCH: Keep one postmaster monitoring pipe per process|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour in the select/poll
implementation on some operating systems:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.
In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.
WalReceiver did run trigger with a rate of ~400 calls per second.
With the result that the system performs very badly,
being mostly busy scheduling idle processs.
Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.
Since the postmaster has three different ways to create
new processes, the patch got a bit more complicated than
I anticipated :)
For auxiliary processes, pgstat, pgarch and the autovacuum launcher
get a preallocated pipe each. The pipes are held in:
Just before we fork a new process we set postmaster_alive_fd
for each process type:
postmaster_alive_fd = postmaster_alive_fds_watch[type];
Since there can be multiple backend processes, BackendStarup()
allocates a pipe on-demand and keeps the reference in the Backend
structure. And is closed when the backend terminates.
The patch was developed and tested under OpenBSD using the REL9_4_STABLE
branch. I've merged it to current, compile tested and ran make check
on Ubuntu 14.04.
"Internally to the kernel, select() and pselect() work poorly if multiple
processes wait on the same file descriptor. Given that, it is rather
surprising to see that many daemons are written that way."
At least OpenBSD and NetBSD are affected, FreeBSD rewrote
their select implementation in 8.0.
|Next Message||Tom Lane||2016-09-15 13:59:57||Re: select_parallel test fails with nonstandard block size|
|Previous Message||Robert Haas||2016-09-15 13:55:50||Re: Hash Indexes|