Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2018-01-24 02:45:21
Message-ID: CAA4eK1J0r=_6QzAOJ1=_LbgSMZhaW0U0s-HvLN=d2T6_=NCwMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> OK. I've committed this patch and back-patched it to 9.6.
>> Back-patching to 9.5 didn't looks simple because nworkers_launched
>> doesn't exist on that branch, and it doesn't matter very much anyway
>> since the infrastructure has no in-core users in 9.5.
>
> I was just talking to Thomas, and one or the other of us realized that
> this doesn't completely fix the problem. I think that if a worker
> fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
> after the point where it attached to the error queue, this patch is
> sufficient to make that reliably turn into an error. But what if it
> fails before that - e.g. fork() failure? In that case, this process
> will catch the problem if the calling process calls
> WaitForParallelWorkersToFinish, but does that actually happen in any
> interesting cases? Maybe not.
>
> In the case of Gather, for example, we won't
> WaitForParallelWorkersToFinish() until we've read all the tuples. If
> there's a tuple queue that does not yet have a sender, then we'll just
> wait for it to get one, even if the sender fell victim to a fork
> failure and is never going to show up.
>

Hmm, I think that case will be addressed because tuple queues can
detect if the leader is not attached. It does in code path
shm_mq_receive->shm_mq_counterparty_gone. In
shm_mq_counterparty_gone, it can detect if the worker is gone by using
GetBackgroundWorkerPid. Moreover, I have manually tested this
particular case before saying your patch is fine. Do you have some
other case in mind which I am missing?

> We could *almost* fix this by having execParallel.c include a Barrier
> in the DSM, similar to what I proposed for parallel CREATE INDEX.
> When a worker starts, it attaches to the barrier; when it exits, it
> detaches. Instead of looping until the leader is done and all the
> tuple queues are detached, Gather or Gather Merge could loop until the
> leader is done and everyone detaches from the barrier. But that would
> require changes to the Barrier API, which isn't prepared to have the
> caller alternate waiting with other activity like reading the tuple
> queues, which would clearly be necessary in this case. Moreover, it
> doesn't work at all when parallel_leader_participation=off, because in
> that case we'll again just wait forever for some worker to show up,
> and if none of them do, then we're hosed.
>
> One idea to fix this is to add a new function in parallel.c with a
> name like CheckForFailedWorkers() that Gather and Gather Merge call
> just before they WaitLatch(). If a worker fails to start, the
> postmaster will send SIGUSR1 to the leader, which will set the latch,
> so we can count on the function being run after every worker death.
> The function would go through and check each worker for which
> pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to
> see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, &pid) ==
> BGWH_STOPPED. If so, ERROR.
>
> This lets us *run* for arbitrary periods of time without detecting a
> fork failure, but before we *wait* we will notice. Getting more
> aggressive than that sounds expensive, but I'm open to ideas.
>
> If we adopt this approach, then Peter's patch could probably make use
> of it, too. It's a little tricky because ConditionVariableSleep()
> tries not to return spuriously, so we'd either have to change that or
> stick CheckForFailedWorkers() into that function's loop. I suspect
> the former is a better approach. Or maybe that patch should still use
> the Barrier approach and avoid all of this annoyance.
>

I think we can discuss your proposed solution once the problem is
clear. As of now, it is not very clear to me whether there is any
pending problem.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-01-24 02:56:45 Re: pgsql: Allow UPDATE to move rows between partitions.
Previous Message Michael Paquier 2018-01-24 02:33:12 Re: pg_rewind and replication slots