Re: Get stuck when dropping a subscription during synchronizing table

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get stuck when dropping a subscription during synchronizing table
Date: 2017-06-27 08:51:31
Message-ID: CAD21AoBZoUxT4a12Djeo=pJOc0fme_ePLzNv3oJ0iR4bed7iDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sun, Jun 25, 2017 at 7:35 PM, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>> (was away for a while, got some time now for this again)
>>
>> On 22/06/17 04:43, Peter Eisentraut wrote:
>>> The alternative is that we use the LockSharedObject() approach that was
>>> already alluded to, like in the attached patch. Both approaches would
>>> work equally fine AFAICT.
>>
>> I agree, but I think we need bigger overhaul of the locking/management
>> in general. So here is patch which does much more changes.
>>
>> The patch does several important things (in no particular order):
>> - Split SetSubscriptionRelState into AddSubscriptionRelState and
>> UpdateSubscriptionRelState for the reasons said upstream, it's cleaner,
>> there is no half-broken upsert logic and it has proper error checking
>> for each action.
>>
>> - Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one
>> is preexisting but mentioning it for context), SetSubscriptionRelState,
>> AddSubscriptionRelState, and in the logicalrep_worker_launch. This means
>> we use granular per object locks to deal with concurrency.
>>
>> - Because of above, the AccessExclusiveLock on pg_subscription is no
>> longer needed, just normal RowExlusiveLock is used now.
>>
>> - logicalrep_worker_stop is also simplified due to the proper locking
>>
>> - There is new interface logicalrep_worker_stop_at_commit which is used
>> by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional
>> variant of DROP SUBSCRIPTION to only kill workers at the end of transaction.
>>
>> - Locking/reading of subscription info is unified between DROP and ALTER
>> SUBSCRIPTION commands.
>>
>> - DROP SUBSCRIPTION will kill all workers associated with subscription,
>> not just apply.
>>
>> - The sync worker checks during startup if the relation is still subscribed.
>>
>> - The sync worker will exit when waiting for apply and apply has shut-down.
>>
>> - The log messages around workers and removed or disabled subscription
>> are now more consistent between startup and normal runtime of the worker.
>>
>> - Some code deduplication and stylistic changes/simplification in
>> related areas.
>>
>> - Fixed catcache's IndexScanOK() handling of the subscription catalog.
>>
>> It's bit bigger patch but solves issues from multiple threads around
>> handling of ALTER/DROP subscription.
>>
>> A lot of the locking that I added is normally done transparently by
>> dependency handling, but subscriptions and subscription relation status
>> do not use that much as it was deemed to bloat pg_depend needlessly
>> during the original patch review (it's also probably why this has
>> slipped through).
>>
>

I've reviewed this patch briefly.

@@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)
}

/*
+ * Request worker to be stopped on commit.
+ */
+void
+logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
+{
+ LogicalRepWorkerId *wid;
+ MemoryContext old;
+
+ old = MemoryContextSwitchTo(TopTransactionContext);
+
+ wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId));
+
+ /*
+ wid = MemoryContextAlloc(TopTransactionContext,
+
sizeof(LogicalRepWorkerId));
+ */
+ wid->subid = subid;
+ wid->relid = relid;
+
+ on_commit_stop_workers = lappend(on_commit_stop_workers, wid);
+
+ MemoryContextSwitchTo(old);
+}

logicalrep_worker_stop_at_commit() has a problem that new_list()
called by lappend() allocates the memory from current memory context.
It should switch the memory context and then allocate the memory for
wid and append it to the list.

--------
@@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)
void
AtEOXact_ApplyLauncher(bool isCommit)
{
- if (isCommit && on_commit_launcher_wakeup)
- ApplyLauncherWakeup();
+ ListCell *lc;
+
+ if (isCommit)
+ {
+ foreach (lc, on_commit_stop_workers)
+ {
+ LogicalRepWorkerId *wid = lfirst(lc);
+ logicalrep_worker_stop(wid->subid, wid->relid);
+ }
+
+ if (on_commit_launcher_wakeup)
+ ApplyLauncherWakeup();

Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't
support the prepared transaction. Since we allocate the list
on_commit_stop_workers in TopTransactionContext the postgres crashes
if we execute any query after prepared transaction that removes
subscription relation state. Also after fixed this issue, we still
need to something: the list of on_commit_stop_workers is not
associated the prepared transaction. A query next to "preapre
transaction" tries to stop workers at the commit. There was similar
discussion before.

--------
+
+ ensure_transaction();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+
rstate->relid, rstate->state,
+
rstate->lsn);

Should we commit the transaction if we started a new transaction
before update the subscription relation state, or it could be deadlock
risk.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Barai 2017-06-27 09:21:41 GSoC 2017: weekly progress reports (week 4) and patch for hash index
Previous Message Amit Langote 2017-06-27 07:56:32 Re: Setting pd_lower in GIN metapage