RE: 024_add_drop_pub.pl might fail due to deadlock

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Ajin Cherian' <itsajin(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: 024_add_drop_pub.pl might fail due to deadlock
Date: 2025-07-23 09:12:08
Message-ID: OSCPR01MB14966A3AA2A7823E8E52E0D54F55FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Ajin,

Thanks for the patch. Firstly let me confirm my understanding. While altering the
subscription, locks are acquired with below ordering:

Order target level
1 pg_subscription row exclusive
2 pg_subscription, my tuple access exclusive
3 pg_subscription_rel access exclusive
4 pg_replication_origin excluive

In contrast, apply worker works like:

Order target level
1 pg_replication_origin excluive
2 pg_subscription, my tuple access share
3 pg_subscrition_rel row exclusive

Thus a backend may wait at step 4, and apply worker can stuck at step 2 or 3.

Below are my comments:

```
@@ -340,7 +341,6 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
* is dropped. So passing missing_ok = false.
*/
ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
-
```
This change is not needed.

```
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
```

I feel it is too strong, isn't it enough to use row exclusive as initially used?

```
UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
- XLogRecPtr sublsn)
+ XLogRecPtr sublsn, bool lock_needed)
```

I feel the name `lock_needed` is bit misleading, because the function still opens
the pg_subscription_rel catalog with row exclusive. How about "lock_shared_object"?

```
@@ -1460,7 +1460,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* Note that the state can't change because we have already stopped both
* the apply and tablesync workers and they can't restart because of
* exclusive lock on the subscription.
+ *
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any
+ * deadlock with apply workers of other subscriptions trying
+ * to drop tracking origin.
*/
+ sub_rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
```

Hmm. Per my understanding, DropSubscription acquires below locks till it reaches
replorigin_drop_by_name().

Order target level
1 pg_subscription access exclusive
2 pg_subscription, my tuple access exclusive
3 pg_replication_origin excluive

IIUC we must preserve the ordering, not the target of locks.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-07-23 09:26:57 Re: 024_add_drop_pub.pl might fail due to deadlock
Previous Message Álvaro Herrera 2025-07-23 09:08:23 Re: RecoveryTargetAction is left out in xlog_interna.h