Re: Get stuck when dropping a subscription during synchronizing table

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get stuck when dropping a subscription during synchronizing table
Date: 2017-05-12 08:24:07
Message-ID: CAD21AoDJihMvdiZv7d_bpMPUK1G379WfxWpeanmJVn1KvEGy0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2017 at 11:24 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>> On 11/05/17 10:10, Masahiko Sawada wrote:
>>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> Barring any objections, I'll add these two issues to open item.
>>>>
>>>> It seems to me that those open items have not been added yet to the
>>>> list. If I am following correctly, they could be defined as follows:
>>>> - Dropping subscription may stuck if done during tablesync.
>>>> -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process.
>>
>> I think the solution to this is to reintroduce the LWLock that was
>> removed and replaced with the exclusive lock on catalog [1]. I am afraid
>> that correct way of handling this is to do both LWLock and catalog lock
>> (first LWLock under which we kill the workers and then catalog lock so
>> that something that prevents launcher from restarting them is held till
>> the end of transaction).
>
> I agree to reintroduce LWLock and to stop logical rep worker first and
> then modify catalog. That way we can reduce catalog lock level (maybe
> to RowExclusiveLock) so that apply worker can see it. Also I think
> that we need to do more things like in order to prevent that we keep
> to hold LWLock until end of transaction, because holding LWLock until
> end of transaction is not good idea and could be cause of deadlock. So
> for example we can commit the transaction in DropSubscription after
> cleaned pg_subscription record and all its dependencies and then start
> new transaction for the remaining work. Of course we also need to
> disallow DROP SUBSCRIPTION being executed in a user transaction
> though.

Attached two draft patches to solve these issues.

Attached 0001 patch reintroduces LogicalRepLauncherLock and makes DROP
SUBSCRIPTION keep holding it until commit. To prevent from deadlock
possibility, I disallowed DROP SUBSCRIPTION being called in a
transaction block. But there might be more sensible solution for this.
please give me feedback.

>
>>
>>>> -- Avoid orphaned tablesync worker if apply worker exits before
>>>> changing its status.
>>>
>>
>> The behavior question I have about this is if sync workers should die
>> when apply worker dies (ie they are tied to apply worker) or if they
>> should be tied to the subscription.
>>
>> I guess taking down all the sync workers when apply worker has exited is
>> easier to solve. Of course it means that if apply worker restarts in
>> middle of table synchronization, the table synchronization will have to
>> start from scratch. That being said, in normal operation apply worker
>> should only exit/restart if subscription has changed or has been
>> dropped/disabled and I think sync workers want to exit/restart in that
>> situation as well.
>
> I agree that sync workers are tied to the apply worker.
>
>>
>> So for example having shmem detach hook for an apply worker (or reusing
>> the existing one) that searches for all the other workers for same
>> subscription and shuts them down as well sounds like solution to this.
>
> Seems reasonable solution.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-a-deadlock-bug-between-DROP-SUBSCRIPTION-and-app.patch application/octet-stream 6.0 KB
0002-Wait-for-table-sync-worker-to-finish-when-apply-work.patch application/octet-stream 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Kellerer 2017-05-12 08:35:20 Re: CTE inlining
Previous Message Amit Langote 2017-05-12 08:16:24 Re: [POC] hash partitioning