Re: Get stuck when dropping a subscription during synchronizing table

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Get stuck when dropping a subscription during synchronizing table
Date: 2017-06-01 03:42:28
Message-ID: CAD21AoB7XFY521BkVBjC3BPuVRASxZFdw3osiwVcgdJPSdZDAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 25, 2017 at 4:14 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> I finally had time to properly analyze this, and turns out we've been
> all just trying to fix symptoms and the actual problems.
>
> All the locking works just fine the way it is in master. The issue with
> deadlock with apply comes from the wrong handling of the SIGTERM in the
> apply (we didn't set InterruptPending). I changed the SIGTERM handler in
> patch 0001 just to die which is actually the correct behavior for apply
> workers. I also moved the connection cleanup code to the
> before_shmem_exit callback (similar to walreceiver) and now that part
> works correctly.
>
> The issue with orphaned sync workers is actually two separate issues.
> First, due to thinko we always searched for sync worker in
> wait_for_sync_status_change instead of searching for opposite worker as
> was the intention (i.e. sync worker should search for apply and apply
> should search for sync). Thats fixed by 0002. And second, we didn't
> accept any invalidation messages until the whole sync process finished
> (because it flattens all the remote transactions in the single one) so
> sync worker didn't learn about subscription changes/drop until it has
> finished, which I now fixed in 0003.
>
> There is still outstanding issue that sync worker will keep running
> inside the long COPY because the invalidation messages are also not
> processed until it finishes but all the original issues reported here
> disappear for me with the attached patches applied.
>

These patches conflict with current HEAD, I attached updated version patches.

Also, the issue that sync worker will keep running inside the long
COPY can lead the another problem that the user could not create new
subscription with some workers due to not enough free logical
replication worker slots until the long COPY finishes. Attached 0004
patch is the updated version patch I submitted before.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-signal-handling-in-logical-workers.patch application/octet-stream 12.0 KB
0002-Make-tablesync-worker-exit-when-apply-dies-while-it-.patch application/octet-stream 2.2 KB
0003-Receive-invalidation-messages-correctly-in-tablesync.patch application/octet-stream 2.3 KB
0004-Wait-for-table-sync-worker-to-finish-when-apply-work.patch application/octet-stream 2.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-06-01 03:45:14 Re: tap tests on older branches fail if concurrency is used
Previous Message Amit Langote 2017-06-01 02:50:34 Re: pg_class.relpartbound definition overly brittle