RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-07-02 07:27:59
Message-ID: TYAPR01MB5724EA41F441F75D4412B57F9440A@TYAPR01MB5724.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 2, 2025 at 2:03 PM Zhijie Hou (Fujitsu) wrote:
>
> On Tue, Jul 1, 2025 at 6:10 PM Zhijie Hou (Fujitsu) wrote:
> > Here is V45 patch set.
>
> With the main patch set now stable, I am summarizing the performance tests
> conducted before for reference.
>
> In earlier tests [1], we confirmed that in a pub-sub cluster with high workload
> on the publisher (via pgbench), the patch had no impact on TPS (Transactions
> Per Second) on the publisher. This indicates that the modifications to the
> walsender responsible for replying to publisher status do not introduce
> noticeable overhead.
>
> Additionally, we confirmed that the patch, with its latest mechanism for
> dynamically tuning the frequency of advancing slot.xmin, does not affect TPS
> on the subscriber when minimal changes occur on the publisher. This test[2]
> involved creating a pub-sub cluster and running pgbench on the subscriber to
> monitor TPS. It further suggests that the logic for maintaining non-removable
> xid in the apply worker does not introduce noticeable overhead for concurrent
> user DMLs.
>
> Furthermore, we tested running pgbench on both publisher and subscriber[3].
> Some regression was observed in TPS on the subscriber, because workload on
> the publisher is pretty high and the apply workers must wait for the amount of
> transactions with earlier timestamps to be applied and flushed before
> advancing the non-removable XID to remove dead tuples. This is the expected
> behavior of this approach since the patch's main goal is to retain dead tuples
> for reliable conflict detection.
>
> When discussing the regression, we considered providing a workaround for
> users to recover from the regression (the 0002 of the latest patch set). We
> introduces a GUC option max_conflict_retention_duration, designed to prevent
> excessive accumulation of dead tuples when subscription with
> retain_conflict_info enabled is present and the apply worker cannot catch up
> with the publisher's workload. In short, the conflict detection replication slot
> will be invalidated if lag time exceeds the specified GUC value.
>
> In performance tests[4], we confirmed that the slot would be invalidated as
> expected when the workload on the publisher was high, and it would not get
> invalidated anymore after reducing the workload. This shows even if the slot
> has been invalidated once, users can continue to detect the update_deleted
> conflict by reduce the workload on the publisher.
>
> The design of the patch set was not changed since the last performance test;
> only some code enhancements have been made. Therefore, I think the results
> and findings from the previous performance tests are still valid. However, if
> necessary, we can rerun all the tests on the latest patch set to verify the same.

During local testing, I discovered a bug caused by my oversight in assigning
the new xmin to slot.effective, which resulted in dead tuples remaining
non-removable until restart. I apologize for the error and have provided
corrected patches. Kindly use the latest patch set for performance testing.

Best Regards,
Hou zj

Attachment Content-Type Size
v46-0005-Allow-altering-retain_conflict_info-for-enabled-.patch application/octet-stream 32.7 KB
v46-0001-Preserve-conflict-relevant-data-during-logical-r.patch application/octet-stream 173.6 KB
v46-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch application/octet-stream 31.1 KB
v46-0003-Re-create-the-replication-slot-if-the-conflict-r.patch application/octet-stream 7.0 KB
v46-0004-Support-the-conflict-detection-for-update_delete.patch application/octet-stream 30.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-07-02 07:32:43 Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Previous Message Nazir Bilal Yavuz 2025-07-02 07:22:39 Explicitly enable meson features in CI