Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-12 19:01:48
Message-ID: CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> In particular, if the worker crashed (say somebody forcefully killed
> the worker), then I think it will lead to a restart of the server. So
> not sure, adding anything about that in the comment will make much
> sense.

I think it's dangerous to rely on that assumption. If somebody stuck
proc_exit(1) into the very beginning of the worker startup sequence,
the worker would exit but the server would not restart.

As I started experimenting with this, I discovered that your patch is
actually unsafe. Here's a test case that shows the breakage:

set force_parallel_mode = on;
select 1;

The expected outcome is:

?column?
----------
1
(1 row)

If I hack do_start_bgworker() to make worker startup fail, like this:

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5701,7 +5701,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
#ifdef EXEC_BACKEND
switch ((worker_pid = bgworker_forkexec(rw->rw_shmem_slot)))
#else
- switch ((worker_pid = fork_process()))
+ switch ((worker_pid = -1))
#endif
{
case -1:

...then it hangs forever. With your patch it no longer hangs forever,
which is good, but it also returns the wrong answer, which is bad:

?column?
----------
(0 rows)

The problem is that Gather sees nworkers_launched > 0 and assumes that
it is therefore entitled not to perform a local scan. I'm pretty sure
that there could also be cases where this patch makes us miss an error
reported by a worker; suppose the worker error is reported and the
worker exits after WaitForParallelWorkersToFinish does
CHECK_FOR_INTERRUPTS() but before we reach GetBackgroundWorkerPid().
The code will simply conclude that the worker is dead and stop
waiting, never mind the fact that there's a pending error in the
queue. This is exactly why I made parallel workers send a Terminate
message to the leader instead of just disappearing -- the leader needs
to see that message to know that it's reached the end of what the
worker is going to send.

So, I think we need to regard fork failure and related failure modes
as ERROR conditions. It's not equivalent to failing to register the
background worker. When we fail to register the background worker,
the parallel.c machinery knows at LaunchParallelWorkers() time that
the worker will never show up and lets the calling code by setting
nworkers_launched, and the calling code can then choose to proceed
with that number of workers or ERROR out as it chooses. But once we
decide on the value of nworkers_launched to report to callers, it's
really not OK for workers to just silently not ever start, as the test
case above shows. We could make it the job of code that uses the
ParallelContext machinery to cope with that situation, but I think
that's a bad plan, because every user would have to worry about this
obscure case. Right now, the only in-core use of the ParallelContext
machinery is Gather/Gather Merge, but there are pending patches for
parallel index scan and parallel VACUUM, so we'll just end up adding
this handling to more and more places. And for what? If you've got
max_connections and/or max_parallel_workers set so high that your
system can't fork(), you have a big problem, and forcing things that
would have run using parallel query to run serially instead of
erroring out is not going to fix it. I think that deciding we're
going to handle fork() failure by reporting an ERROR is just fine.

So, here's a patch. This patch keeps track of whether we've ever
received a message from each worker via the error queue. If there are
no remaining workers which have neither exited cleanly nor sent us at
least one message, then it calls GetBackgroundWorkerPid() for each
one. If any worker reports that it is BGWH_STOPPED, then it
double-checks whether the worker has attached to the error queue. If
it did, then it assumes that we'll detect clean shutdown or an error
on the next iteration and ignores the problem. If not, then the
worker died without ever attaching the error queue, so it throws an
ERROR. I tested that this causes the test case above to throw a
proper ERROR when fork_process() returns -1. Please review...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
parallel-worker-fork-failure.patch application/octet-stream 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-12 19:04:55 Re: Using ProcSignal to get memory context stats from a running backend
Previous Message Andres Freund 2017-12-12 18:50:51 Re: Using ProcSignal to get memory context stats from a running backend