RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-09-11 08:59:05
Message-ID: TY4PR01MB1690751D1CA8C128B0770EC6F9409A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, September 8, 2025 7:21 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, September 8, 2025 3:13 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Sep 5, 2025 at 5:03 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com>
> > wrote:
> > >
> > > Here are v2 patches which addressed above comments.
> > >
> >
> > I have pushed the first patch. I find that the test can't reliably fail without a fix.
> > Can you please investigate it?
>
> Thank you for catching this issue. I confirmed that the test may have tested
> VACCUM before slot.xmin was advanced. Therefore, to improve the test, I
> modified test to wait for the publisher's request message appearing twice, as
> after the fix, the apply worker should keep waiting for publisher status until the
> prepared txn is committed.
>
> Also, to reduce test time, I moved the test into the existing 035 test.
>
> Here is the updated test.

I noticed a BF failure[1] on this test. The log shows that the apply worker
advances the non-removable xid to the latest state before waiting for the
prepared transaction to commit. Upon reviewing the log, I didn't find any clues
of a bug in the code. One potential explanation is that the prepared transaction
hasn't reached the injection point before the apply worker requests the
publisher status.

The log lacks the timing for when the injection point is triggered and only
includes:

pub: 2025-09-11 03:40:05.667 CEST [396867][client backend][8/3:0] LOG: statement: COMMIT PREPARED 'txn_with_later_commit_ts';
..
sub: 2025-09-11 03:40:05.684 CEST [396798][logical replication apply worker][16/0:0] DEBUG: sending publisher status request message

Although the statement on the publisher appears before the publisher request,
the statement log is generated prior to command execution. Thus, it's possible
the injection point is triggered after responding to the publisher status.

After checking some other tap tests using injection points, most of them ensure
the injection is triggered before proceeding with the test (by waiting for the
wait event of injection point). We could also add this in the test:

$node_B->wait_for_event('client backend', 'commit-after-delay-checkpoint');

Here is a small patch.

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-09-11%2001%3A17%3A25&stg=subscription-check

Best Regards,
Hou zj

Attachment Content-Type Size
v1-0001-Fix-unstable-test-in-6456c6e.patch application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-09-11 09:08:31 Re: GB18030-2022 Support in PostgreSQL
Previous Message Maxim Orlov 2025-09-11 08:58:13 Re: POC: make mxidoff 64 bits