From: | Ajin Cherian <itsajin(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: 024_add_drop_pub.pl might fail due to deadlock |
Date: | 2025-07-23 11:58:39 |
Message-ID: | CAFPTHDZTDWv0wpccxm73koixvQcorPrPSHg=d+sLb23xXS7qEQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 23, 2025 at 4:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jul 21, 2025 at 6:59 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > Yes, this is correct. I have also verified this in my testing that
> > when there is a second subscription, a deadlock can still happen with
> > my previous patch. I have now modified the patch in tablesync worker
> > to take locks on both SubscriptionRelationId and
> > SubscriptionRelRelationId prior to taking lock on
> > ReplicationOriginRelationId. I have also found that there is a similar
> > problem in DropSubscription() where lock on SubscriptionRelRelationId
> > is not taken before dropping the tracking origin. I've also modified
> > the signature of UpdateSubscriptionRelState to take a bool
> > "lock_needed" which if false, the SubscriptionRelationId is not
> > locked. I've made a new version of the patch on PG_15.
> >
>
> Review comments:
> ================
> 1.
> + if (!rel)
> + rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
>
> Why did you acquire AccessExclusiveLock here when the current code has
> RowExclusiveLock? It should be RowExclusiveLock.
>
Yes, you are correct. I have replaced it with RowExclusiveLock.
> 2.
> + *
> + * Lock SubscriptionRelationId with AccessShareLock and
> + * take AccessExclusiveLock on SubscriptionRelRelationId to
> + * prevent any deadlocks with user concurrently performing
> + * refresh on the subscription.
> */
>
> Try to tell in the comments that we want to keep the locking order
> same as DDL commands to prevent deadlocks.
>
Modified.
> 3.
> + * Also close any tables prior to the commit.
> */
> + if (rel)
> + {
> + table_close(rel, AccessExclusiveLock);
> + rel = NULL;
>
> We don't need to explicitly release lock on table_close, it will be
> done at transaction end, so use NoLock here as we do in current HEAD
> code.
>
Done.
> 4.
> DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
> {
> - Relation rel;
> + Relation rel, sub_rel;
> ObjectAddress myself;
> HeapTuple tup;
> Oid subid;
> @@ -1460,7 +1460,13 @@ DropSubscription(DropSubscriptionStmt *stmt,
> bool isTopLevel)
> * Note that the state can't change because we have already stopped both
> * the apply and tablesync workers and they can't restart because of
> * exclusive lock on the subscription.
> + *
> + * Lock pg_subscription_rel with AccessExclusiveLock to prevent any
> + * deadlock with apply workers of other subscriptions trying
> + * to drop tracking origin.
> */
> + sub_rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
>
> I don't think we need AccessExclusiveLock on SubscriptionRelRelationId
> in DropSubscription. Kindly test again after fixing the first comment
> above.
> --
Yes, it was failing because I was taking AccessExclusiveLock in the
apply worker, and that was causing the deadlock in my testing. I
tested this worked.
On Wed, Jul 23, 2025 at 7:12 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Ajin,
>
> Thanks for the patch. Firstly let me confirm my understanding. While altering the
> subscription, locks are acquired with below ordering:
>
> Order target level
> 1 pg_subscription row exclusive
> 2 pg_subscription, my tuple access exclusive
> 3 pg_subscription_rel access exclusive
> 4 pg_replication_origin excluive
>
> In contrast, apply worker works like:
>
> Order target level
> 1 pg_replication_origin excluive
> 2 pg_subscription, my tuple access share
> 3 pg_subscrition_rel row exclusive
>
> Thus a backend may wait at step 4, and apply worker can stuck at step 2 or 3.
>
Yes, that is correct.
> Below are my comments:
>
> ```
> @@ -340,7 +341,6 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
> * is dropped. So passing missing_ok = false.
> */
> ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
> -
> ```
> This change is not needed.
Removed.
>
> ```
> + if (!rel)
> + rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
> +
> ```
>
> I feel it is too strong, isn't it enough to use row exclusive as initially used?
>
Yes, agree. Fixed.
> ```
> UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
> - XLogRecPtr sublsn)
> + XLogRecPtr sublsn, bool lock_needed)
> ```
>
> I feel the name `lock_needed` is bit misleading, because the function still opens
> the pg_subscription_rel catalog with row exclusive. How about "lock_shared_object"?
>
I have modified it to not take lock while table_open as well and
changed the name of the variable to already_locked.
> ```
> @@ -1460,7 +1460,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
> * Note that the state can't change because we have already stopped both
> * the apply and tablesync workers and they can't restart because of
> * exclusive lock on the subscription.
> + *
> + * Lock pg_subscription_rel with AccessExclusiveLock to prevent any
> + * deadlock with apply workers of other subscriptions trying
> + * to drop tracking origin.
> */
> + sub_rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
> ```
>
> Hmm. Per my understanding, DropSubscription acquires below locks till it reaches
> replorigin_drop_by_name().
>
> Order target level
> 1 pg_subscription access exclusive
> 2 pg_subscription, my tuple access exclusive
> 3 pg_replication_origin excluive
>
> IIUC we must preserve the ordering, not the target of locks.
I have removed this change as this does not now conflict with the apply worker.
Two patches attached. One for HEAD and the other for PG_15.
regards,
Ajin Cherian
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
HEAD-v4-0001-Fix-a-possible-deadlock-during-ALTER-SUBSCRIPTION.patch | application/x-patch | 6.2 KB |
PG_15-v4-0001-Fix-a-deadlock-during-ALTER-SUBSCRIPTION.-DROP-PU.patch | application/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Srinath Reddy Sadipiralla | 2025-07-23 12:40:00 | is there any specific use of setting transaction_read_only GUC externally? |
Previous Message | Daniel Verite | 2025-07-23 11:53:52 | Re: Collation and primary keys |