Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-08 16:15:25
Message-ID: CA+TgmoYN9RJn=aCPijrFB2pEFg3q7anutkfp0FVBFDydXcTQug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 8, 2017 at 4:56 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think the optimization you are suggesting has a merit over what I
> have done in the patch. However, one point to note is that we are
> anyway going to call that function down in the path(
> WaitForParallelWorkersToExit->WaitForBackgroundWorkerShutdown->GetBackgroundWorkerPid),
> so we might want to take the advantage of calling it first time as
> well. We can actually cache the status of workers that have returned
> BGWH_STOPPED and use it later so that we don't need to make this
> function call again for workers which are already stopped. We can
> even do what Tom is suggesting to avoid calling it for workers which
> are known to be launched, but I am really not sure if that function
> call is costly enough that we need to maintain one extra state to
> avoid that.

Well, let's do what optimization we can without making it too complicated.

> While looking into this code path, I wonder how much we are gaining by
> having two separate calls (WaitForParallelWorkersToFinish and
> WaitForParallelWorkersToExit) to check the status of workers after
> finishing the parallel execution? They are used together in rescan
> code path, so apparently there is no benefit calling them separately
> there. OTOH, during shutdown of nodes, it will often be the case that
> they will be called in a short duration after each other except for
> the case where we need to shut down the node for the early exit like
> when there is a limit clause or such.

I'm not 100% sure either, but if we're going to do anything about
that, it seems like a topic for another patch. I don't think it's
completely without value because there is some time after we
WaitForParallelWorkersToFinish and before we
WaitForParallelWorkersToExit during which we can do things like
retrieve instrumentation data and shut down other nodes, but whether
it pulls it weight in code I don't know. This kind of grew up
gradually: originally I/we didn't think of all of the cases where we
needed the workers to actually exit rather than just indicating that
they were done generating tuples, and the current situation is the
result of a series of bug-fixes related to that oversight, so it's
quite possible that a redesign would make sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-12-08 16:25:07 Re: [HACKERS] What does it mean by XLOG_BACKUP_RECORD?
Previous Message Robert Haas 2017-12-08 15:57:12 Re: [HACKERS] Runtime Partition Pruning