Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-21 14:13:28
Message-ID: CAA4eK1L0QoS0VSG=guyFiTM0TgoSAoLewSkj0XP-D9tGJ-nDLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2017 at 6:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 21, 2017 at 6:46 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> What if we don't allow to reuse such slots till the backend/session
>> that has registered it performs unregister? Currently, we don't seem
>> to have an API corresponding to Register*BackgroundWorker() which can
>> be used to unregister, but maybe we can provide such an API.
>
> Well, then we could have slots pinned down for a long time, if the
> backend never gets around to calling unregister. Furthermore, that's
> absolutely not back-patchable, because we can't put a requirement like
> that on code running in the back branches. Also, what if the code
> path that would have done the unregister eventually errors out? We'd
> need TRY/CATCH blocks everywhere that registers the worker. In short,
> this seems terrible for multiple reasons.
>
>>> Furthermore, it doesn't help in the case where the worker starts and
>>> immediately exits without attaching to the DSM.
>>
>> Yeah, but can't we detect that case? After the worker exits, we can
>> know its exit status as is passed to CleanupBackgroundWorker, we can
>> use that to mark the worker state as BGWH_ERROR_STOPPED (or something
>> like BGWH_IMMEDIATE_STOPPED).
>>
>> I think above way sounds invasive, but it seems to me that it can be
>> used by other users of background workers as well.
>
> The exit status doesn't tell us whether the worker attached to the DSM.
>
> I'm relatively puzzled as to why you're rejecting a relatively
> low-impact way of handling a corner case that was missed in the
> original design in favor of major architectural changes.
>

I am not against using the way specific to parallel context layer as
described by you above. However, I was trying to see if there is
some general purpose solution as the low-impact way is not very
straightforward. I think you can go ahead with the way you have
described to fix the hole I was pointing to and I can review it or I
can also give it a try if you want to.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-21 14:18:03 Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Previous Message Beena Emerson 2017-12-21 14:02:29 Re: [HACKERS] Runtime Partition Pruning