Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-13 06:41:25
Message-ID: CAA4eK1LkHBbafduA8em1fb-in4kDBgmQf6qO5TV8CfyuTzEFaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2017 at 12:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> In particular, if the worker crashed (say somebody forcefully killed
>> the worker), then I think it will lead to a restart of the server. So
>> not sure, adding anything about that in the comment will make much
>> sense.
>
>
> The problem is that Gather sees nworkers_launched > 0 and assumes that
> it is therefore entitled not to perform a local scan.
>

Yeah, I think that assumption is not right. I think ideally before
assuming that it should wait for workers to startup.

> I'm pretty sure
> that there could also be cases where this patch makes us miss an error
> reported by a worker; suppose the worker error is reported and the
> worker exits after WaitForParallelWorkersToFinish does
> CHECK_FOR_INTERRUPTS() but before we reach GetBackgroundWorkerPid().
> The code will simply conclude that the worker is dead and stop
> waiting, never mind the fact that there's a pending error in the
> queue.

I have tried to reproduce this situation by adding error case in
worker (just before worker returns success ('X' message)) and by
having breakpoint in WaitForParallelWorkersToFinish. What, I noticed
is that worker is not allowed to exit till it receives some ack from
master (basically worker waits at SendProcSignal after sending
message) and error is reported properly.

> This is exactly why I made parallel workers send a Terminate
> message to the leader instead of just disappearing -- the leader needs
> to see that message to know that it's reached the end of what the
> worker is going to send.
>
> So, I think we need to regard fork failure and related failure modes
> as ERROR conditions. It's not equivalent to failing to register the
> background worker. When we fail to register the background worker,
> the parallel.c machinery knows at LaunchParallelWorkers() time that
> the worker will never show up and lets the calling code by setting
> nworkers_launched, and the calling code can then choose to proceed
> with that number of workers or ERROR out as it chooses. But once we
> decide on the value of nworkers_launched to report to callers, it's
> really not OK for workers to just silently not ever start, as the test
> case above shows. We could make it the job of code that uses the
> ParallelContext machinery to cope with that situation, but I think
> that's a bad plan, because every user would have to worry about this
> obscure case. Right now, the only in-core use of the ParallelContext
> machinery is Gather/Gather Merge, but there are pending patches for
> parallel index scan and parallel VACUUM, so we'll just end up adding
> this handling to more and more places. And for what? If you've got
> max_connections and/or max_parallel_workers set so high that your
> system can't fork(), you have a big problem, and forcing things that
> would have run using parallel query to run serially instead of
> erroring out is not going to fix it. I think that deciding we're
> going to handle fork() failure by reporting an ERROR is just fine.
>

So, if I understand correctly, this means that it will return an error
even if there is a problem in one worker (exited or not started) even
though other workers would have easily finished the work. Won't such
an error can lead to situations where after running the query for one
hour the user might see some failure just because one of the many
workers is not started (or some similar failure) even though the query
would have been completed without that? If so, that doesn't sound
like a sensible behavior.

> So, here's a patch. This patch keeps track of whether we've ever
> received a message from each worker via the error queue. If there are
> no remaining workers which have neither exited cleanly nor sent us at
> least one message, then it calls GetBackgroundWorkerPid() for each
> one. If any worker reports that it is BGWH_STOPPED, then it
> double-checks whether the worker has attached to the error queue. If
> it did, then it assumes that we'll detect clean shutdown or an error
> on the next iteration and ignores the problem. If not, then the
> worker died without ever attaching the error queue, so it throws an
> ERROR. I tested that this causes the test case above to throw a
> proper ERROR when fork_process() returns -1. Please review...
>

This also doesn't appear to be completely safe. If we add
proc_exit(1) after attaching to error queue (say after
pq_set_parallel_master) in the worker, then it will lead to *hang* as
anyone_alive will still be false and as it will find that the sender
is set for the error queue, it won't return any failure. Now, I think
even if we check worker status (which will be stopped) and break after
the new error condition, it won't work as it will still return zero
rows in the case reported by you above.

I think if before making an assumption not to scan locally if we have
a check to see whether workers are started, then my patch should work
fine and we don't need to add any additional checks. Also, it won't
create any performance issue as we will perform such a check only when
force_parallel_mode is not off or parallel leader participation is off
which I think are not common cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-13 06:46:19 Top-N sorts verses parallelism
Previous Message Amit Langote 2017-12-13 06:32:42 Re: Boolean partitions syntax