Re: parallel.c oblivion of worker-startup failures

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel.c oblivion of worker-startup failures
Date: 2017-09-19 03:17:06
Message-ID: CAA4eK1JAuXhK8Fdw_Rwpg4H4yLWz1pBL-3Znnea7o1QKF4L6yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 18, 2017 at 10:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> Attached patch fixes these problems.
>
> Hmm, this patch adds a kill(notify_pid) after one call to
> ForgetBackgroundWorker, but the postmaster has several more such calls.
> Shouldn't they all notify the notify_pid? Should we move that
> functionality into ForgetBackgroundWorker itself, so we can't forget
> it again?
>

Among other places, we already notify during
ReportBackgroundWorkerExit(). However, it seems to me that all other
places except where this patch has added a call to notify doesn't need
such a call. The other places like in DetermineSleepTime and
ResetBackgroundWorkerCrashTimes is called for a crashed worker which I
don't think requires notification to the backend as that backend
itself would have restarted. The other place where we call
ForgetBackgroundWorker is in maybe_start_bgworkers when rw_terminate
is set to true which again seems to be either the case of worker crash
or when someone has explicitly asked to terminate the worker in which
case we already send a notification.

I think we need to notify the backend on start, stop and failure to
start a worker. OTOH, if it is harmless to send a notification even
after the worker is crashed, then we can just move that functionality
into ForgetBackgroundWorker itself as that will simplify the code and
infact that is the first thing that occurred to me as well, but I
haven't done that way as I was not sure if we want to send
notification in all kind of cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-09-19 03:40:58 Re: Setting pd_lower in GIN metapage
Previous Message Andres Freund 2017-09-19 02:54:49 Re: pgbench tap tests & minor fixes.