Re: Get stuck when dropping a subscription during synchronizing table

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get stuck when dropping a subscription during synchronizing table
Date: 2017-05-10 02:57:54
Message-ID: CAD21AoB6MJfRxMJpLSEvqMyigU9BSP5aMBYG28QpbcW2C1X8FA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 10, 2017 at 2:46 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, May 8, 2017 at 8:42 PM, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>> On 08/05/17 11:27, Masahiko Sawada wrote:
>>> Hi,
>>>
>>> I encountered a situation where DROP SUBSCRIPTION got stuck when
>>> initial table sync is in progress. In my environment, I created
>>> several tables with some data on publisher. I created subscription on
>>> subscriber and drop subscription immediately after that. It doesn't
>>> always happen but I often encountered it on my environment.
>>>
>>> ps -x command shows the following.
>>>
>>> 96796 ? Ss 0:00 postgres: masahiko postgres [local] DROP
>>> SUBSCRIPTION
>>> 96801 ? Ts 0:00 postgres: bgworker: logical replication
>>> worker for subscription 40993 waiting
>>> 96805 ? Ss 0:07 postgres: bgworker: logical replication
>>> worker for subscription 40993 sync 16418
>>> 96806 ? Ss 0:01 postgres: wal sender process masahiko [local] idle
>>> 96807 ? Ss 0:00 postgres: bgworker: logical replication
>>> worker for subscription 40993 sync 16421
>>> 96808 ? Ss 0:00 postgres: wal sender process masahiko [local] idle
>>>
>>> The DROP SUBSCRIPTION process (pid 96796) is waiting for the apply
>>> worker process (pid 96801) to stop while holding a lock on
>>> pg_subscription_rel. On the other hand the apply worker is waiting for
>>> acquiring a tuple lock on pg_subscription_rel needed for heap_update.
>>> Also table sync workers (pid 96805 and 96807) are waiting for the
>>> apply worker process to change their status.
>>>
>>
>> Looks like we should kill apply before dropping dependencies.
>
> Sorry, after investigated I found out that DROP SUBSCRIPTION process
> is holding AccessExclusiveLock on pg_subscription (, not
> pg_subscription_rel) and apply worker is waiting for acquiring a lock
> on it.

Hmm it seems there are two cases. One is that the apply worker waits
to acquire AccessShareLock on pg_subscription but DropSubscription
already acquired AcessExclusiveLock on it and waits for the apply
worker to finish. Another case is that the apply worker waits to
acquire a tuple lock on pg_subscrption_rel but DropSubscription (maybe
droppoing dependencies) already acquired it.

> So I guess that the dropping dependencies are not relevant with
> this. It seems to me that the main cause is that DROP SUBSCRIPTION
> waits for apply worker to finish while keeping to hold
> AccessExclusiveLock on pg_subscription. Perhaps we need to contrive
> ways to reduce lock level somehow.
>
>>
>>> Also, even when DROP SUBSCRIPTION is done successfully, the table sync
>>> worker can be orphaned because I guess that the apply worker can exit
>>> before change status of table sync worker.
>>
>> Well the tablesync worker should stop itself if the subscription got
>> removed, but of course again the dependencies are an issue, so we should
>> probably kill those explicitly as well.
>
> Yeah, I think that we should ensure that the apply worker exits after
> killed all involved table sync workers.
>

Barring any objections, I'll add these two issues to open item.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-10 03:08:12 Re: multi-column range partition constraint
Previous Message Noah Misch 2017-05-10 02:56:20 Re: delta relations in AFTER triggers