Re: Get stuck when dropping a subscription during synchronizing table

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get stuck when dropping a subscription during synchronizing table
Date: 2017-06-27 16:47:31
Message-ID: 3e06c16c-f441-c606-985c-6d6239097f54@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 27/06/17 10:51, Masahiko Sawada wrote:
> On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've reviewed this patch briefly.

Thanks!

>
> @@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)
> }
>
> /*
> + * Request worker to be stopped on commit.
> + */
> +void
> +logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
> +{
> + LogicalRepWorkerId *wid;
> + MemoryContext old;
> +
> + old = MemoryContextSwitchTo(TopTransactionContext);
> +
> + wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId));
> +
> + /*
> + wid = MemoryContextAlloc(TopTransactionContext,
> +
> sizeof(LogicalRepWorkerId));
> + */
> + wid->subid = subid;
> + wid->relid = relid;
> +
> + on_commit_stop_workers = lappend(on_commit_stop_workers, wid);
> +
> + MemoryContextSwitchTo(old);
> +}
>
> logicalrep_worker_stop_at_commit() has a problem that new_list()
> called by lappend() allocates the memory from current memory context.
> It should switch the memory context and then allocate the memory for
> wid and append it to the list.
>

Right, fixed (I see you did that locally as well based on the above
excerpt ;) ).

> --------
> @@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)
> void
> AtEOXact_ApplyLauncher(bool isCommit)
> {
> - if (isCommit && on_commit_launcher_wakeup)
> - ApplyLauncherWakeup();
> + ListCell *lc;
> +
> + if (isCommit)
> + {
> + foreach (lc, on_commit_stop_workers)
> + {
> + LogicalRepWorkerId *wid = lfirst(lc);
> + logicalrep_worker_stop(wid->subid, wid->relid);
> + }
> +
> + if (on_commit_launcher_wakeup)
> + ApplyLauncherWakeup();
>
> Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't
> support the prepared transaction. Since we allocate the list
> on_commit_stop_workers in TopTransactionContext the postgres crashes
> if we execute any query after prepared transaction that removes
> subscription relation state. Also after fixed this issue, we still
> need to something: the list of on_commit_stop_workers is not
> associated the prepared transaction. A query next to "preapre
> transaction" tries to stop workers at the commit. There was similar
> discussion before.
>

Hmm, good point. I think for now it makes sense to simply don't allow
PREPARE for transactions that manipulate workers similar to what we do
when there are exported snapshots. Done it that way in attached.

> --------
> +
> + ensure_transaction();
> +
> UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
> +
> rstate->relid, rstate->state,
> +
> rstate->lsn);
>
>
> Should we commit the transaction if we started a new transaction
> before update the subscription relation state, or it could be deadlock
> risk.

We only lock the whole subscription (and only conflicting things are
DROP and ALTER SUBSCRIPTION), not individual subscription-relation
mapping so it doesn't seem to me like there is any higher risk of
deadlocks than anything else which works with multiple tables (or
compared to previous behavior).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v2-0001-Rework-subscription-worker-and-relation-status-handl.patch text/x-patch 40.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-06-27 17:57:15 Re: memory layouts for binary search in nbtree
Previous Message Merlin Moncure 2017-06-27 15:50:51 Re: lag(bigint,int,int), etc?