From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | RE: Conflict detection for update_deleted in logical replication |
Date: | 2025-07-02 07:27:59 |
Message-ID: | TYAPR01MB5724EA41F441F75D4412B57F9440A@TYAPR01MB5724.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 2, 2025 at 2:03 PM Zhijie Hou (Fujitsu) wrote:
>
> On Tue, Jul 1, 2025 at 6:10 PM Zhijie Hou (Fujitsu) wrote:
> > Here is V45 patch set.
>
> With the main patch set now stable, I am summarizing the performance tests
> conducted before for reference.
>
> In earlier tests [1], we confirmed that in a pub-sub cluster with high workload
> on the publisher (via pgbench), the patch had no impact on TPS (Transactions
> Per Second) on the publisher. This indicates that the modifications to the
> walsender responsible for replying to publisher status do not introduce
> noticeable overhead.
>
> Additionally, we confirmed that the patch, with its latest mechanism for
> dynamically tuning the frequency of advancing slot.xmin, does not affect TPS
> on the subscriber when minimal changes occur on the publisher. This test[2]
> involved creating a pub-sub cluster and running pgbench on the subscriber to
> monitor TPS. It further suggests that the logic for maintaining non-removable
> xid in the apply worker does not introduce noticeable overhead for concurrent
> user DMLs.
>
> Furthermore, we tested running pgbench on both publisher and subscriber[3].
> Some regression was observed in TPS on the subscriber, because workload on
> the publisher is pretty high and the apply workers must wait for the amount of
> transactions with earlier timestamps to be applied and flushed before
> advancing the non-removable XID to remove dead tuples. This is the expected
> behavior of this approach since the patch's main goal is to retain dead tuples
> for reliable conflict detection.
>
> When discussing the regression, we considered providing a workaround for
> users to recover from the regression (the 0002 of the latest patch set). We
> introduces a GUC option max_conflict_retention_duration, designed to prevent
> excessive accumulation of dead tuples when subscription with
> retain_conflict_info enabled is present and the apply worker cannot catch up
> with the publisher's workload. In short, the conflict detection replication slot
> will be invalidated if lag time exceeds the specified GUC value.
>
> In performance tests[4], we confirmed that the slot would be invalidated as
> expected when the workload on the publisher was high, and it would not get
> invalidated anymore after reducing the workload. This shows even if the slot
> has been invalidated once, users can continue to detect the update_deleted
> conflict by reduce the workload on the publisher.
>
> The design of the patch set was not changed since the last performance test;
> only some code enhancements have been made. Therefore, I think the results
> and findings from the previous performance tests are still valid. However, if
> necessary, we can rerun all the tests on the latest patch set to verify the same.
During local testing, I discovered a bug caused by my oversight in assigning
the new xmin to slot.effective, which resulted in dead tuples remaining
non-removable until restart. I apologize for the error and have provided
corrected patches. Kindly use the latest patch set for performance testing.
Best Regards,
Hou zj
Attachment | Content-Type | Size |
---|---|---|
v46-0005-Allow-altering-retain_conflict_info-for-enabled-.patch | application/octet-stream | 32.7 KB |
v46-0001-Preserve-conflict-relevant-data-during-logical-r.patch | application/octet-stream | 173.6 KB |
v46-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch | application/octet-stream | 31.1 KB |
v46-0003-Re-create-the-replication-slot-if-the-conflict-r.patch | application/octet-stream | 7.0 KB |
v46-0004-Support-the-conflict-detection-for-update_delete.patch | application/octet-stream | 30.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-07-02 07:32:43 | Re: Reduce "Var IS [NOT] NULL" quals during constant folding |
Previous Message | Nazir Bilal Yavuz | 2025-07-02 07:22:39 | Explicitly enable meson features in CI |