Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2018-01-24 22:52:33
Message-ID: CAH2-Wz=TnL4xP4cuEqQy-bCzQJoyKR3Ec3dsMbHPvagroY=isw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2018 at 12:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> In Thomas's test case, he's using 4 workers, and if even one of those
> manages to start, then it'll probably do so after any fork failures
> have already occurred, and the error will be noticed when that process
> sends its first message to the leader through the error queue, because
> it'll send PROCSIG_PARALLEL_MESSAGE via all the queues. If all of the
> workers fail to start, then that doesn't help. But it still manages
> to detect the failure in that case because it reaches
> WaitForParallelWorkersToFinish, which we just patched.
>
> But how does that happen, anyway? Shouldn't it get stuck waiting for
> the tuple queues to which nobody's attached yet? The reason it
> doesn't is because
> ExecParallelCreateReaders() calls shm_mq_set_handle() for each queue,
> which causes the tuple queues to be regarded as detached if the worker
> fails to start. A detached tuple queue, in general, is not an error
> condition: it just means that worker has no more tuples.

This explains all the confusion. Amit told me that using a tuple queue
made all the difference here. Even till, it seemed surprising that
we'd rely on that from such a long distance from within nodeGather.c.

> I guess that works, but it seems more like blind luck than good
> design. Parallel CREATE INDEX fails to be as "lucky" as Gather
> because there's nothing in parallel CREATE INDEX that lets it skip
> waiting for a worker which doesn't start -- and in general it seems
> unacceptable to impose a coding requirement that all future parallel
> operations must fail to notice the difference between a worker that
> completed successfully and one that never ran at all.

+1.

> If we made the Gather node wait only for workers that actually reached
> the Gather -- either by using a Barrier or by some other technique --
> then this would be a lot less fragile. And the same kind of technique
> would work for parallel CREATE INDEX.

The use of a barrier has problems of its own [1], though, of which one
is unique to the parallel_leader_participation=off case. I thought
that you yourself agreed with this [2] -- do you?

Another argument against using a barrier in this way is that it seems
like way too much mechanism to solve a simple problem. Why should a
client of parallel.h not be able to rely on nworkers_launched (perhaps
only after "verifying it can be trusted")? That seem like a pretty
reasonable requirement for clients to have for any framework for
parallel imperative programming.

I think that we should implement "some other technique", instead of
using a barrier. As I've said, Amit's WaitForParallelWorkersToAttach()
idea seems like a promising "other technique".

[1] https://www.postgresql.org/message-id/CAA4eK1+a0OF4M231vBgPr_0Ygg_BNmRGZLiB7WQDE-FYBSyrGg@mail.gmail.com
[2] https://www.postgresql.org/message-id/CA+TgmoaF8UA8v8hP=CcoqUc50pucPC8ABj-_yyC++yGggjWFsw@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-24 22:52:47 Re: Converting plpgsql to use DTYPE_REC for named composite types
Previous Message Stephen Frost 2018-01-24 22:51:43 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump