Quick Links

RE: Conflict detection for update_deleted in logical replication

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject:	RE: Conflict detection for update_deleted in logical replication
Date:	2025-07-06 11:03:47
Message-ID:	OSCPR01MB1496663AED8EEC566074DFBC9F54CA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear hackers,

As a confirmation purpose, I did performance testing with four workloads
we did before.

Highlights
==========
The retests on the latest patch set v46 show results consistent with previous
observations:
- There is no performance impact on the publisher side
- There is no performance impact on the subscriber side, if the workload is
running only on subscriber.
- The performance is reduced on the subscriber side (TPS reduction (~50%) [Test-03])
when retain_conflict_info=on and pgbench is running on both side. Because dead
tuple retention for conflict detection. If high workloads on the publisher,
the apply workers must wait for the amount of transactions with earlier
timestamps to be applied and flushed before advancing the non-removable XID
to remove dead tuples.
- Subscriber-side TPS improves when the workload on the publisher is reduced.
- Performance on the subscriber can also be improved by tuning the
max_conflict_retention_duration GUC properly.

Used source
===========
pgHead commit fd7d7b7191 + v46 patchset

Machine details
===============
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM

01. pgbench on publisher
========================
The workload is mostly same as [1].

Workload:
- Ran pgbench with 40 clients for the publisher.
- The duration was 120s, and the measurement was repeated 10 times.

(pubtest.tar.gz can run the same workload)

Test Scenarios & Results:
- pgHead : Median TPS = 39809.84925
- pgHead + patch : Median TPS = 40102.88108

Observation:
- No performance regression observed with the patch applied.
- The results were consistent across runs.

Detailed Results Table:
- each cell shows the TPS in each case.
- patch(ON) means patched and retain_conflict_info=ON is set.

run# pgHEAD pgHead+patch(ON)
1 40106.88834 40356.60039
2 39854.17244 40087.18077
3 39516.26983 40063.34688
4 39746.45715 40389.40549
5 40014.83857 40537.24
6 39819.26374 40016.78705
7 39800.43476 38774.9827
8 39884.2691 40163.35257
9 39753.11246 39902.02755
10 39427.2353 40118.58138
median 39809.84925 40102.88108

02. pgbench on subscriber
========================
The workload is mostly same as [2].

Workload:
- Ran pgbench with 40 clients for the *subscriber*.
- The duration was 120s, and the measurement was repeated 10 times.

(subtest.tar.gz can run the same workload)

Test Scenarios & Results:
- pgHead : Median TPS = 41564.64591
- pgHead + patch : Median TPS = 41083.09555

Observation:
- No performance regression observed with the patch applied.
- The results were consistent across runs.

Detailed Results Table:

run# pgHEAD pgHead+patch(ON)
1 41605.88999 41106.93126
2 41555.76448 40975.9575
3 41505.76161 41223.92841
4 41722.50373 41049.52787
5 41400.48427 41262.15085
6 41386.47969 41059.25985
7 41679.7485 40916.93053
8 41563.60036 41178.82461
9 41565.69145 41672.41773
10 41765.11049 40958.73512
median 41564.64591 41083.09555

03. pgbench on both sides
========================
The workload is mostly same as [3].

Workload:
- Ran pgbench with 40 clients for the *both side*.
- The duration was 120s, and the measurement was repeated 10 times.

(bothtest.tar.gz can run the same workload)

Test Scenarios & Results:
Publisher:
- pgHead : Median TPS = 16799.67659
- pgHead + patch : Median TPS = 17338.38423
Subscriber:
- pgHead : Median TPS = 16552.60515
- pgHead + patch : Median TPS = 8367.133693

Observation:
- No performance regression observed on the publisher with the patch applied.
- The performance is reduced on the subscriber side (TPS reduction (~50%)) due
to dead tuple retention for the conflict detection

Detailed Results Table:

On publisher:
run# pgHEAD pgHead+patch(ON)
1 16735.53391 17369.89325
2 16957.01458 17077.96864
3 16838.07008 17480.08206
4 16743.67772 17531.00493
5 16776.74723 17511.4314
6 16784.73354 17235.76573
7 16871.63841 17255.04538
8 16814.61964 17460.33946
9 16903.14424 17024.77703
10 16556.05636 17306.87522
median 16799.67659 17338.38423

On subscriber:
run# pgHEAD pgHead+patch(ON)
1 16505.27302 8381.200661
2 16765.38292 8353.310973
3 16899.41055 8396.901652
4 16305.05353 8413.058805
5 16722.90536 8320.833085
6 16587.64864 8327.217432
7 16508.45076 8369.205438
8 16357.05337 8394.34603
9 16724.90296 8351.718212
10 16517.56167 8365.061948
median 16552.60515 8367.133693

04. pgbench on both side, and max_conflict_retention_duration was tuned
========================================================================
The workload is mostly same as [4].

Workload:
- Initially ran pgbench with 40 clients for the *both side*.
- Set max_conflict_retention_duration = {60, 120}
- When the slot is invalidated on the subscriber side, stop the benchmark and
wait until the subscriber would be caught up. Then the number of clients on
the publisher would be half.
In this test the conflict slot could be invalidated as expected when the workload
on the publisher was high, and it would not get invalidated anymore after
reducing the workload. This shows even if the slot has been invalidated once,
users can continue to detect the update_deleted conflict by reduce the
workload on the publisher.
- Total period of the test was 900s for each cases.

(max_conflixt.tar.gz can run the same workload)

Observation:
-
- Parallelism of the publisher side is reduced till 15->7->3 and finally the
conflict slot is not invalidated.
- TPS on the subscriber side is improved when the concurrency was reduced.
This is because the dead tuple accumulation is reduced on subscriber due to
the reduced workload on the publisher.
- when publisher has Nclients=3, no regression in subscriber's TPS

Detailed Results Table:
For max_conflict_retention_duration = 60s
On publisher:
Nclients duration [s] TPS
15 72 14079.1
7 82 9307
3 446 4133.2

On subscriber:
Nclients duration [s] TPS
15 72 6827
15 81 7200
15 446 19129.4

For max_conflict_retention_duration = 120s
On publisher:
Nclients duration [s] TPS
15 162 17835.3
7 152 9503.8
3 283 4243.9

On subscriber:
Nclients duration [s] TPS
15 162 4571.8
15 152 4707
15 283 19568.4

Thanks Nisha-san and Hou-san for helping the work.

[1]: https://www.postgresql.org/message-id/CABdArM5SpMyGvQTsX0-d%3Db%2BJAh0VQjuoyf9jFqcrQ3JLws5eOw%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/TYAPR01MB5692B0182356F041DC9DE3B5F53E2%40TYAPR01MB5692.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CABdArM4OEwmh_31dQ8_F__VmHwk2ag_M%3DYDD4H%2ByYQBG%2BbHGzg%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/OSCPR01MB14966F39BE1732B9E433023BFF5E72%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

RE: Conflict detection for update_deleted in logical replication at 2025-07-04 11:18:38 from Zhijie Hou (Fujitsu)

Responses

Re: Conflict detection for update_deleted in logical replication at 2025-07-06 14:50:42 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hannu Krosing	2025-07-06 11:48:13	Re: [PATCH] Extending FK check skipping on replicas to ADD FK and TRUNCATE
Previous Message	Etsuro Fujita	2025-07-06 09:04:46	Re: Proposal to allow DELETE/UPDATE on partitioned tables with unsupported foreign partitions