Re: DROP SUBSCRIPTION and ROLLBACK

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DROP SUBSCRIPTION and ROLLBACK
Date: 2017-02-07 14:53:09
Message-ID: CAHGQGwEgOKJAXKHkiXVs6snQGpWQZo04Y=oLZqQV07vcJmxxPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 7, 2017 at 9:10 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Hi all,
>
> While testing logical replciation I found that if the transaction
> issued DROP SUBSCRIPTION rollbacks then the logical repliation stops
> and the subscription can never be removed later. The document says
> that the replication worker associated with the subscription will not
> stop until after the transaction that issued this command has
> committed but it doesn't work.

Yeah, this is a bug.

ISTM that CREATE SUBSCRIPTION also has the similar issue. It creates
the replication slot on the publisher side before the transaction has been
committed. Even if the transaction is rollbacked, that replication slot is
not removed. That is, in a transaction block, we should not connect to
the publisher. Instead, the launcher or worker should do.

> The cause of this is that DropSubscription stops the apply worker and
> drops corresponding replication slot on publisher side without waiting
> for commit or rollback. The launcher process launches the apply worker
> again but the launched worker will fail to start logical replication
> because corresponding replication slot is already removed. And the
> orphan subscription can not be removed later.
>
> I think the logical replication should not stop and the corresponding
> replication slot and replication origin should not be removed until
> the transaction commits.

Yes.

> The solution for this I came up with is that the launcher process
> stops the apply worker after DROP SUBSCRIPTION is committed rather
> than DropSubscription does. And the apply worker drops replication
> slot and replication origin before exits. Attached draft patch fixes
> this issue.
>
> Please give me feedback.

The patch failed to apply to HEAD.

+ worker = logicalrep_worker_find(subid);
+ if (worker)
{
- heap_close(rel, NoLock);
- return;
+ if (stmt->drop_slot)
+ worker->drop_slot = true;
+ worker->need_to_stop = true;

"drop_slot" and "need_to_stop" seem to be set to true even if the transaction
is rollbacked. This would cause the same problem that you're trying to fix.

I think that we should make the launcher periodically checks pg_subscription
and stop the worker if there is no its corresponding subscription. Then,
if necessary, the worker should remove its replication slot from the publisher.

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Raiskup 2017-02-07 14:55:32 [PATCH] configure-time knob to set default ssl ciphers
Previous Message Stephen Frost 2017-02-07 14:51:04 Re: pg_restore is broken on 9.2 version.