Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2018-01-23 21:05:21
Message-ID: CA+TgmoYGDtSNE_Csw7v=fzeMKyX33ohFvVgH=i3-mCS+=EAOhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> It does turn such cases into error but only at the end when someone
>> calls WaitForParallelWorkersToFinish. If one tries to rely on it at
>> any other time, it won't work as I think is the case for Parallel
>> Create Index where Peter G. is trying to use it differently. I am not
>> 100% sure whether Parallel Create index has this symptom as I have not
>> tried it with that patch, but I and Peter are trying to figure that
>> out.
>
> OK. I've committed this patch and back-patched it to 9.6.
> Back-patching to 9.5 didn't looks simple because nworkers_launched
> doesn't exist on that branch, and it doesn't matter very much anyway
> since the infrastructure has no in-core users in 9.5.

I was just talking to Thomas, and one or the other of us realized that
this doesn't completely fix the problem. I think that if a worker
fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
after the point where it attached to the error queue, this patch is
sufficient to make that reliably turn into an error. But what if it
fails before that - e.g. fork() failure? In that case, this process
will catch the problem if the calling process calls
WaitForParallelWorkersToFinish, but does that actually happen in any
interesting cases? Maybe not.

In the case of Gather, for example, we won't
WaitForParallelWorkersToFinish() until we've read all the tuples. If
there's a tuple queue that does not yet have a sender, then we'll just
wait for it to get one, even if the sender fell victim to a fork
failure and is never going to show up.

We could *almost* fix this by having execParallel.c include a Barrier
in the DSM, similar to what I proposed for parallel CREATE INDEX.
When a worker starts, it attaches to the barrier; when it exits, it
detaches. Instead of looping until the leader is done and all the
tuple queues are detached, Gather or Gather Merge could loop until the
leader is done and everyone detaches from the barrier. But that would
require changes to the Barrier API, which isn't prepared to have the
caller alternate waiting with other activity like reading the tuple
queues, which would clearly be necessary in this case. Moreover, it
doesn't work at all when parallel_leader_participation=off, because in
that case we'll again just wait forever for some worker to show up,
and if none of them do, then we're hosed.

One idea to fix this is to add a new function in parallel.c with a
name like CheckForFailedWorkers() that Gather and Gather Merge call
just before they WaitLatch(). If a worker fails to start, the
postmaster will send SIGUSR1 to the leader, which will set the latch,
so we can count on the function being run after every worker death.
The function would go through and check each worker for which
pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to
see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, &pid) ==
BGWH_STOPPED. If so, ERROR.

This lets us *run* for arbitrary periods of time without detecting a
fork failure, but before we *wait* we will notice. Getting more
aggressive than that sounds expensive, but I'm open to ideas.

If we adopt this approach, then Peter's patch could probably make use
of it, too. It's a little tricky because ConditionVariableSleep()
tries not to return spuriously, so we'd either have to change that or
stick CheckForFailedWorkers() into that function's loop. I suspect
the former is a better approach. Or maybe that patch should still use
the Barrier approach and avoid all of this annoyance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-23 21:07:00 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Peter Eisentraut 2018-01-23 21:04:49 Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative