Quick Links

Re: Conflict detection for update_deleted in logical replication

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: Conflict detection for update_deleted in logical replication
Date:	2025-07-06 14:50:38
Message-ID:	CAD21AoAaJG-TbvqNkcUHdQ45Wd4ajGz3DCifKJ-a-+awxJ1Hzw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jul 4, 2025 at 8:18 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wed, Jul 2, 2025 at 3:28 PM Hou, Zhijie wrote:
> > Kindly use the latest patch set for performance testing.
>
> During testing, we observed a limitation in cascading logical replication
> setups, such as (A -> B -> C). When retain_conflict_info is enabled on Node C,
> it may not retain information necessary for conflict detection when applying
> changes originally replicated from Node A. This happens because Node C only
> waits for locally originated changes on Node B to be applied before advancing
> the non-removable transaction ID.
>
> For example, Consider a logical replication setup as mentioned above : A -> B -> C.
> - All three nodes have a table t1 with two tuples (1,1) (2,2).
> - Node B subscribed to all changes of t1 from Node A
> - Node-C subscribed to all changes from Node B.
> - Subscriptions use the default origin=ANY, as this is not a bidirectional
> setup.
>
> Now, consider two concurrent operations:
> - @9:00 Node A - UPDATE (1,1) -> (1,11)
>
> - @9:02 Node C - DELETE (1,1)
>
> Assume a slight delay at Node B before it applies the update from Node A.
>
> @9:03 Node C - advances the non-removable XID because it sees no concurrent
> transactions from Node B. It is unaware of Node A’s concurrent update.
>
> @9:04 Node B - receives Node A's UPDATE and applies (1,1) -> (1,11)
> t1 has tuples : (1,11), (2,2)
>
> @9:05 Node C - receives the UPDATE (1,1) -> (1,11)
> - As conflict slot’s xmin is advanced, the deleted tuple may already have
> been removed.
> - Conflict resolution fails to detect update_deleted and instead raises
> update_missing.
>
> Note that, as per decoding logic Node C sees the commit timestamp of the update
> as 9:00 (origin commit_ts from Node A), not 9:04 (commit time on Node B). In
> this case, since the UPDATE's timestamp is earlier than the DELETE, Node C
> should ideally detect an update_deleted conflict. However, it cannot, because
> it no longer retains the deleted tuple.
>
> Even if Node C attempts to retrieve the latest WAL position from Node A, Node C
> doesn't maintain any LSN which we could use to compare with it.
>
> This scenario is similar to another restriction in the patch where
> retain_conflict_info is not supported if the publisher is also a physical
> standby, as the required transaction information from the original primary is
> unavailable. Moreover, this limitation is relevant only when the subscription
> origin option is set to ANY, as only in that case changes from other origins
> can be replicated. Since retain_conflict_info is primarily useful for conflict
> detection in bidirectional clusters where the origin option is set to NONE,
> this limitation appears acceptable.
>
> Given these findings, to help users avoid unintended configurations, we plan to
> issue a warning in scenarios where replicated changes may include origins other
> than the direct publisher, similar to the existing checks in the
> check_publications_origin() function.
>
> Here is the latest patch that implements the warning and documents
> this case. Only 0001 is modified for this.
>
> A big thanks to Nisha for invaluable assistance in identifying this
> case and preparing the analysis for it.

I'm still reviewing the 0001 patch but let me share some comments and
questions I have so far:

---
It seems there is no place where we describe the overall idea of
reliably detecting update_deleted conflicts. The comment atop
maybe_advance_nonremovable_xid() describes why the implemented
algorithm works for that purpose but doesn't how it is implemented,
for example the relationship with pg_conflict_detection slot. I'm not
sure the long comment atop maybe_advance_nonremovable_xid() is the
right place as it seems to be a description beyond explaining
maybe_advance_nonremovable_xid() function. Probably we can move that
comment and explain the overall idea somewhere for example atop
worker.c or in README.

---
The new parameter name "retain_conflict_info" sounds to me like we
keep the conflict information somewhere that has expired at some time
such as how many times insert_exists or update_origin_differs
happened. How about choosing a name that indicates retain dead tuples
more explicitly for example retain_dead_tuples?

---
You mentioned in the previous email:

> Furthermore, we tested running pgbench on both publisher and subscriber[3].
> Some regression was observed in TPS on the subscriber, because workload on the
> publisher is pretty high and the apply workers must wait for the amount of
> transactions with earlier timestamps to be applied and flushed before advancing
> the non-removable XID to remove dead tuples. This is the expected behavior of
> this approach since the patch's main goal is to retain dead tuples for reliable
> conflict detection.

Have you conducted any performance testing of a scenario where a
publisher replicates a large number of databases (say 64) to a
subscriber? I'm particularly interested in a configuration where
retain_conflict_info is set to true, and there are 64 apply workers
running on the subscriber side. In such a setup, even when running
pgbench exclusively on the publisher's databases, I suspect the
replication lag would likely increase quickly, as all apply workers on
the subscriber would be impacted by the overhead of retaining dead
tuples.

---
@@ -71,8 +72,9 @@
#define SUBOPT_PASSWORD_REQUIRED 0x00000800
#define SUBOPT_RUN_AS_OWNER 0x00001000
#define SUBOPT_FAILOVER 0x00002000
-#define SUBOPT_LSN 0x00004000
-#define SUBOPT_ORIGIN 0x00008000
+#define SUBOPT_RETAIN_CONFLICT_INFO 0x00004000
+#define SUBOPT_LSN 0x00008000
+#define SUBOPT_ORIGIN 0x00010000

Why do we need to change the existing options' value?

---
+ * This is required to ensure that we don't advance the xmin
+ * of CONFLICT_DETECTION_SLOT even if one of the subscriptions
+ * is not enabled. Otherwise, we won't be able to detect

I guess the "even" in the first sentence is not necessary.

---
+/*
+ * Determine the minimum non-removable transaction ID across all apply workers
+ * for subscriptions that have retain_conflict_info enabled. Store the result
+ * in *xmin.
+ *
+ * If the replication slot cannot be advanced during this cycle, due to either
+ * a disabled subscription or an inactive worker, set *can_advance_xmin to
+ * false.
+ */
+static void
+compute_min_nonremovable_xid(LogicalRepWorker *worker,
+ bool retain_conflict_info, TransactionId *xmin,
+ bool *can_advance_xmin)

I think this function is quite confusing for several reasons. For
instance, it's doing more things than described in the comments such
as trying to create the CONFLICT_DETECTION_SLOT if no worker is
passed. Also, one of the caller describes:

+ /*
+ * This is required to ensure that we don't advance the xmin
+ * of CONFLICT_DETECTION_SLOT even if one of the subscriptions
+ * is not enabled. Otherwise, we won't be able to detect
+ * conflicts reliably for such a subscription even though it
+ * has set the retain_conflict_info option.
+ */
+ compute_min_nonremovable_xid(NULL, sub->retainconflictinfo,
+ &xmin, &can_advance_xmin);

but it's unclear to me from the function name that it tries to create
the replication slot. Furthermore, in this path it doesn't actually
compute xmin. I guess we can try to create CONFLICT_DETECTION_SLOT in
the loop of "foreach(lc, sublist)" and set false to can_advance_xmin
if either the subscription is disabled or the worker is not running.

---
+ FullTransactionId remote_oldestxid; /* oldest transaction ID that was in
+ * the commit phase on the publisher.
+ * Use FullTransactionId to prevent
+ * issues with transaction ID
+ * wraparound, where a new
+ * remote_oldestxid could falsely
+ * appear to originate from the past
+ * and block advancement */
+ FullTransactionId remote_nextxid; /* next transaction ID to be assigned
+ * on the publisher. Use
+ * FullTransactionId for consistency
+ * and to allow straightforward
+ * comparisons with remote_oldestxid. */

I think it would be readable if we could write them above each field.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

RE: Conflict detection for update_deleted in logical replication at 2025-07-04 11:18:38 from Zhijie Hou (Fujitsu)

Responses

RE: Conflict detection for update_deleted in logical replication at 2025-07-07 02:12:46 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2025-07-06 14:50:42	Re: Conflict detection for update_deleted in logical replication
Previous Message	Hannu Krosing	2025-07-06 14:48:11	Re: [PATCH] Extending FK check skipping on replicas to ADD FK and TRUNCATE