Re: Conflict detection for update_deleted in logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-07-05 11:21:34
Message-ID: CAFiTN-sw0zRNcO40=o8vtBPS1_bsb3TStByewu-MyxDhCb+cDg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 5, 2025 at 2:26 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jul 4, 2025 at 4:48 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wed, Jul 2, 2025 at 3:28 PM Hou, Zhijie wrote:
> > > Kindly use the latest patch set for performance testing.
> >
> > During testing, we observed a limitation in cascading logical replication
> > setups, such as (A -> B -> C). When retain_conflict_info is enabled on Node C,
> > it may not retain information necessary for conflict detection when applying
> > changes originally replicated from Node A. This happens because Node C only
> > waits for locally originated changes on Node B to be applied before advancing
> > the non-removable transaction ID.
> >
> > For example, Consider a logical replication setup as mentioned above : A -> B -> C.
> > - All three nodes have a table t1 with two tuples (1,1) (2,2).
> > - Node B subscribed to all changes of t1 from Node A
> > - Node-C subscribed to all changes from Node B.
> > - Subscriptions use the default origin=ANY, as this is not a bidirectional
> > setup.
> >
> > Now, consider two concurrent operations:
> > - @9:00 Node A - UPDATE (1,1) -> (1,11)
> >
> > - @9:02 Node C - DELETE (1,1)
> >
> > Assume a slight delay at Node B before it applies the update from Node A.
> >
> > @9:03 Node C - advances the non-removable XID because it sees no concurrent
> > transactions from Node B. It is unaware of Node A’s concurrent update.
> >
> > @9:04 Node B - receives Node A's UPDATE and applies (1,1) -> (1,11)
> > t1 has tuples : (1,11), (2,2)
> >
> > @9:05 Node C - receives the UPDATE (1,1) -> (1,11)
> > - As conflict slot’s xmin is advanced, the deleted tuple may already have
> > been removed.
> > - Conflict resolution fails to detect update_deleted and instead raises
> > update_missing.
> >
> > Note that, as per decoding logic Node C sees the commit timestamp of the update
> > as 9:00 (origin commit_ts from Node A), not 9:04 (commit time on Node B). In
> > this case, since the UPDATE's timestamp is earlier than the DELETE, Node C
> > should ideally detect an update_deleted conflict. However, it cannot, because
> > it no longer retains the deleted tuple.
> >
> > Even if Node C attempts to retrieve the latest WAL position from Node A, Node C
> > doesn't maintain any LSN which we could use to compare with it.
> >
> > This scenario is similar to another restriction in the patch where
> > retain_conflict_info is not supported if the publisher is also a physical
> > standby, as the required transaction information from the original primary is
> > unavailable. Moreover, this limitation is relevant only when the subscription
> > origin option is set to ANY, as only in that case changes from other origins
> > can be replicated. Since retain_conflict_info is primarily useful for conflict
> > detection in bidirectional clusters where the origin option is set to NONE,
> > this limitation appears acceptable.
> >
> > Given these findings, to help users avoid unintended configurations, we plan to
> > issue a warning in scenarios where replicated changes may include origins other
> > than the direct publisher, similar to the existing checks in the
> > check_publications_origin() function.
> >
> > Here is the latest patch that implements the warning and documents
> > this case. Only 0001 is modified for this.
> >
> > A big thanks to Nisha for invaluable assistance in identifying this
> > case and preparing the analysis for it.
>
> In this setup if we have A->B->C->A then after we implement conflict
> resolution is it possible that node A will just left with (2,2),
> because (1,11) will be deleted while applying the changes from Node C
> whereas node C has detected the indirect conflicting update from Node
> A as update missing and has inserted the row and it will left with
> (1,11) and (2,2). So can it cause divergence as I explained here, or
> it will not? If not then can you explain how?

Thinking further, I believe this will lead to data divergence.
However, in this specific setup, that's not a concern. If a user needs
to guarantee consistency across all nodes, they'll have to configure a
one-to-one publication-subscription relationship between each pair of
nodes: A to B and B to A, B to C and C to B, and A to C and C to A. In
a cascading setup, however, we cannot expect all nodes to contain
identical data. So I think I am fine with giving a WARNING what you
have done in your patch.

--
Regards,
Dilip Kumar
Google

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2025-07-05 12:41:27 Re: Add \pset options for boolean value display
Previous Message Dmitry Dolgov 2025-07-05 10:35:44 Re: Changing shared_buffers without restart