Re: Wait for parallel workers to attach

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: Wait for parallel workers to attach
Date: 2018-01-28 22:31:40
Message-ID: CAH2-Wz=AmMJP8-0GfQJQgL+maJVGKpcB7U4m0tzwRdtOy5DX7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 27, 2018 at 12:14 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> During the recent development of parallel operation (parallel create
> index)[1], a need has been arised for $SUBJECT. The idea is to allow
> leader backend to rely on number of workers that are successfully
> started. This API allows leader to wait for all the workers to start
> or fail even if one of the workers fails to attach. We consider
> workers started/attached once they are attached to error queue. This
> will ensure that any error after the workers are attached won't be
> silently ignored by leader.

I've modified my working copy of the parallel CREATE INDEX patch to
call WaitForParallelWorkersToAttach(), just after the leader
participates as a worker.

> I have tested this patch by calling this API in nodeGather.c and then
> introducing failuires at various places: (a) induce fork failure for
> background workers (force_fork_failure_v1.patch), (b) Exit parallel
> worker before attaching to the error queue
> (exit_parallel_worker_before_error_queue_attach_v1.patch) and (c) Exit
> parallel worker after attaching to the error queue
> (exit_parallel_worker_after_error_queue_attach_v1.patch).
>
> In all above cases, I got the errors as expected.

I also found that all of these errors were propagated. The patch helps
parallel CREATE INDEX as expected/designed.

Some small things that I noticed about the patch:

* Maybe "if (!known_started_workers[i])" should be converted to "if
(known_started_workers[i]) continue;", to decrease the indentation
level of the WaitForParallelWorkersToAttach() loop.

* There might be some opportunity to share some of the new code with
the code recently committed to WaitForParallelWorkersToFinish(). For
one thing, the logic in this block could be refactored into a
dedicated function that is called by both
WaitForParallelWorkersToAttach() and WaitForParallelWorkersToFinish():

> + else if (status == BGWH_STOPPED)
> + {
> + /*
> + * Check whether the worker ended up stopped without ever
> + * attaching to the error queue. If so, the postmaster
> + * was unable to fork the worker or it exited without
> + * initializing properly. We must throw an error, since
> + * the caller may have been expecting the worker to do
> + * some work before exiting.
> + */
> + mq = shm_mq_get_queue(pcxt->worker[i].error_mqh);
> + if (shm_mq_get_sender(mq) == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("parallel worker failed to initialize"),
> + errhint("More details may be available in the server log.")));
> +
> + known_started_workers[i] = true;
> + ++nknown_started_workers;
> + }

* If we don't actually commit the patch to make nodeGather.c call
WaitForParallelWorkersToAttach(), which I suspect will happen, then I
think we should instead at least have a comment saying why it's okay
that we don't call WaitForParallelWorkersToAttach(). If we go this
way, the comment should directly replace the
modify_gather_to_wait_for_attach_v1.patch call to
WaitForParallelWorkersToAttach() -- this comment should go in
ExecGather().

* Maybe the comments at the top of WaitForParallelWorkersToAttach()
should at least allude to the ExecGather() special case I just
mentioned.

* Maybe the comments at the top of WaitForParallelWorkersToAttach()
should also advise callers that it's a good idea to try to do other
leader-only work that doesn't involve a WaitLatch() before they call.

In general, WaitForParallelWorkersToAttach() needs to be called before
any WaitLatch() (or barrier wait, or condition variable wait) that
waits on workers, and after workers are first launched, but callers
may be able to arrange to do plenty of other work, just like nbtsort.c
does. To be clear: IMHO calling WaitForParallelWorkersToAttach()
should be the rule, not the exception.

* Finally, perhaps the comments at the top of
WaitForParallelWorkersToAttach() should also describe how it relates
to WaitForParallelWorkersToFinish().

ISTM that WaitForParallelWorkersToAttach() is a subset of
WaitForParallelWorkersToFinish(), that does all that is needed to rely
on nworkers_launched actually being the number of worker processes
that are attached to the error queue. As such, caller can expect
propagation of errors from workers using standard parallel message
interrupts once WaitForParallelWorkersToAttach() returns. You probably
shouldn't directly reference nworkers_launched, though, since that
doesn't necessarily have to be involved for parallel client code to
run into trouble. (You only need to wait on workers changing something
in shared memory, and failing to actively inform leader of failure
through a parallel message -- this might not involve testing
nworkers_launched in the way parallel CREATE INDEX happens to.)

All in all, I'm very pleased with where this leaves parallel CREATE
INDEX. I hope that Robert can review and commit this patch quickly, so
that I can use the new infrastructure. I can then post what I hope to
be the final revision of parallel CREATE INDEX. ISTM that the question
of handling things like parallel worker fork() failure is the only
real blocker to committing parallel CREATE INDEX, since we've reached
agreement on all other issues.

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-01-28 23:00:54 Re: Built-in connection pooling
Previous Message Dmitry Dolgov 2018-01-28 22:09:47 Re: Write lifetime hints for NVMe