Re: Conflict detection for update_deleted in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-07-24 03:42:17
Message-ID: CAJpy0uDiyjDzLU-=NGO7PnXB4OLy4+RxJiAySdw=a+YO62JO2g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 23, 2025 at 12:53 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, July 23, 2025 12:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 23, 2025 at 3:51 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've reviewed the 0001 patch and it looks good to me.
> > >
> >
> > Thanks, I have pushed the 0001 patch.
>
> Thanks for pushing. I have rebased the remaining patches.
>
> I have reordered the patches to prioritize the detection of update_deleted as
> the initial patch. This can give us more time to consider the new GUC, since the
> performance-related aspects have been documented.
>

Thank You for patches. Please find a few comments for 0001:

1)
+ if (HeapTupleSatisfiesVacuum(tuple, oldestXmin, buf)
== HEAPTUPLE_RECENTLY_DEAD)
+ dead = true;

Shall we name the variable as 'recently_dead' as 'dead' can even mean
HEAPTUPLE_DEAD.

2)
+ if (MySubscription->retaindeadtuples &&
+ FindMostRecentlyDeletedTupleInfo(localrel, remoteslot,
+
&conflicttuple.xmin,
+
&conflicttuple.origin,
+
&conflicttuple.ts) &&
+ conflicttuple.origin != replorigin_session_origin)
+ type = CT_UPDATE_DELETED;
+ else
+ type = CT_UPDATE_MISSING;

Shall the conflict be detected as update_deleted irrespective of origin?

3)
+ /*
+ * We do not consider HEAPTUPLE_DEAD status because it indicates
+ * either tuples whose inserting transaction was
aborted, meaning
+ * there is no commit timestamp or origin, or tuples
deleted by a
+ * transaction older than oldestXmin, making it safe
to ignore them
+ * during conflict detection (See comments atop
+ * maybe_advance_nonremovable_xid() for details).
+ */

a) We can use parentheses for below otherwise sentence becomes
confusing due to multiple 'or' with 'either'
(meaning there is no commit timestamp or origin)

b) we can change last line to: See comments atop worker.c for details

4)
+ <para>
+ The tuple to be updated was deleted by another origin. The update will
+ simply be skipped in this scenario.
+ Note that this conflict can only be detected when
+ <xref linkend="guc-track-commit-timestamp"/>
+ and <link
linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>
+ are enabled. Note that if a tuple cannot be found due to the table being
+ truncated only a <literal>update_missing</literal> conflict will
+ arise
+ </para>

4a)
Can we please make the link as:
<link linkend="guc-track-commit-timestamp"><varname>track_commit_timestamp</varname></link>

This is because the current usage of 'xref linkend' gives it a
slightly bigger font in html file, while all other references of this
GUC is normal font.

4b) We need a comma after truncated:
Note that if a tuple cannot be found due to the table being truncated,
only a update_missing conflict will arise.

5)
monitoring.sgml:
+ <para>
+ Number of times the tuple to be updated was deleted by another origin
+ during the application of changes. See <xref
linkend="conflict-update-deleted"/>
+ for details about this conflict.
+ </para></entry>

Here we are using the term 'by another origin', while in the rest of
the doc (see confl_update_origin_differs, confl_delete_origin_differs)
we use the term 'by another source'. Shall we keep it the same?
OTOH, I think using 'origin' is better but the rest of the page is
using source. So perhaps changing source to origin everywhere is
better. Thoughts?
This can be changed if needed once we decide on point 2 above.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-07-24 03:43:18 Re: Fixing MSVC's inability to detect elog(ERROR) does not return
Previous Message Richard Guo 2025-07-24 03:40:50 Re: Pathify RHS unique-ification for semijoin planning