RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-09-05 11:33:18
Message-ID: TY4PR01MB16907EEBB6D0CC953EC191E599403A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, September 5, 2025 2:01 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Sep 4, 2025 at 3:30 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Hi,
> >
> > As reported by Robert[1], it is worth adding a test for the race condition in
> > the RecordTransactionCommitPrepared() function to reduce the risk of
> future code
> > changes:
> >
> > /*
> > * Note it is important to set committs value after marking ourselves
> as
> > * in the commit critical section (DELAY_CHKPT_IN_COMMIT).
> This is because
> > * we want to ensure all transactions that have acquired commit
> timestamp
> > * are finished before we allow the logical replication client to
> advance
> > * its xid which is used to hold back dead rows for conflict
> detection.
> > * See comments atop worker.c.
> > */
> > committs = GetCurrentTimestamp();
> >
> > While writing the test, I noticed a bug that conflict-relevant data could be
> > prematurely removed before applying prepared transactions on the publisher
> that
> > are in the commit phase. This occurred because
> GetOldestActiveTransactionId()
> > was called on the publisher, which failed to account for the backend
> executing
> > COMMIT PREPARED, as this backend does not have an xid stored in
> PGPROC.
> >
> > Since this issue overlaps with the race condition related to timestamp
> > acquisition, I've prepared a bug fix along with a test for the race condition.
> > The 0001 patch fixes this issue by introducing a new function that iterates
> over
> > global transactions and identifies prepared transactions during the commit
> > phase. 0002 added injection points and tap-test to test the bug and
> timestamp
> > acquisition.
> >
>
> Thank You for the patches.Verified 001, works well. Just one minor comment:
>
> 1)
> + PGPROC *proc = GetPGProcByNumber(gxact->pgprocno);
> + PGPROC *commitproc;
> We get first proc and then commitproc. proc is used only to get
> databaseId, why don't we get databaseId from commitproc itself?

I think it's OK to directly refer to commitproc, so changed.

>
> ~~
>
> Few trivial comments for 002:

Thanks for the comments.

> 1)
> +# Test that publisher's transactions marked with
> DELAY_CHKPT_IN_COMMIT prevent
> +# concurrently deleted tuples on the subscriber from being removed.
>
> Here shall we also mention something like:
> This test also acts as a safeguard to prevent developers from moving
> the timestamp acquisition before marking DELAY_CHKPT_IN_COMMIT in
> RecordTransactionCommitPrepared.

Added.

>
> 2)
> # This test depends on an injection point to block the transaction commit after
> # marking DELAY_CHKPT_IN_COMMIT flag.
>
> Shall we say:
> ..to block the prepared transaction commit..

Changed.

>
> 3)
> +# Create a publisher node. Disable autovacuum to stablized the tests related
> to
> +# manual VACUUM and transaction ID.
>
> to stablized --> to stabilize

Fixed.

>
> 4)
> +# Confirm that the dead tuple can be removed now
> +my ($cmdret, $stdout, $stderr) =
> + $node_subscriber->psql('postgres', qq(VACUUM (verbose) public.tab;));
> +
> +ok($stderr =~ qr/1 are dead but not yet removable/,
> + 'the deleted column is non-removable');
>
> can be removed now --> cannot be removed now

Fixed.

Here are v2 patches which addressed above comments.

Best Regards,
Hou zj

Attachment Content-Type Size
v2-0002-Add-a-race-condition-test.patch application/octet-stream 10.2 KB
v2-0001-Fix-conflict-relevant-data-retention-for-prepared.patch application/octet-stream 4.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mihail Nikalayeu 2025-09-05 11:44:59 Re: Parallel Apply
Previous Message Dmitry Mityugov 2025-09-05 11:32:31 Re: --with-llvm on 32-bit platforms?