Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-19 10:01:05
Message-ID: CAA4eK1JYZeiA5g4ciZtRT3=73gt3O+hgMk3e6dwTBUhaZcDGBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2017 at 3:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 13, 2017 at 1:41 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>> This also doesn't appear to be completely safe. If we add
>> proc_exit(1) after attaching to error queue (say after
>> pq_set_parallel_master) in the worker, then it will lead to *hang* as
>> anyone_alive will still be false and as it will find that the sender
>> is set for the error queue, it won't return any failure. Now, I think
>> even if we check worker status (which will be stopped) and break after
>> the new error condition, it won't work as it will still return zero
>> rows in the case reported by you above.
>
> Hmm, there might still be a problem there. I was thinking that once
> the leader attaches to the queue, we can rely on the leader reaching
> "ERROR: lost connection to parallel worker" in HandleParallelMessages.
> However, that might not work because nothing sets
> ParallelMessagePending in that case. The worker will have detached
> the queue via shm_mq_detach_callback, but the leader will only
> discover the detach if it actually tries to read from that queue.
>

I think it would have been much easier to fix this problem if we would
have some way to differentiate whether the worker has stopped
gracefully or not. Do you think it makes sense to introduce such a
state in the background worker machinery?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-12-19 10:06:16 non-bulk inserts and tuple routing
Previous Message Oleksandr Shulgin 2017-12-19 10:00:01 Re: Estimate maintenance_work_mem for CREATE INDEX