From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | Re: Conflict detection for update_deleted in logical replication |
Date: | 2025-07-25 11:38:01 |
Message-ID: | CABdArM66d9Gn8GKO=1a-YYsiiD1X2fdAxwWS+7KuKuuGFq8S4A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi All,
We conducted performance testing of a bi-directional logical
replication setup, focusing on the primary use case of the
update_deleted feature.
To simulate a realistic scenario, we used a high workload with limited
concurrent updates, and well-distributed writes among servers.
Used source
===========
pgHead commit 62a17a92833 + v47 patch set
Machine details
===============
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM
Test-1: Distributed Write Load
==============================
Highlight:
-----------
- In a bi-directional logical replication setup, with
well-distributed write workloads and a thoughtfully tuned
configuration to minimize lag (e.g., through row filters), TPS
regression is minimal or even negligible.
- Performance can be sustained with significantly fewer apply workers
compared to the number of client connections on the publisher.
Setup:
--------
- 2 Nodes(node1 and node2) are created(on same machine) of same
configurations -
autovacuum = false
shared_buffers = '30GB'
-- Also, worker and logical replication related parameters were
increased as per requirement (see attached scripts for details).
- Both nodes have two set of pgbench tables initiated with *scale=300*:
-- set1: pgbench_pub_accounts, pgbench_pub_tellers,
pgbench_pub_branches, and pgbench_pub_history
-- set2: pgbench_accounts, pgbench_tellers, pgbench_branches, and
pgbench_history
- Node1 is publishing all changes for set1 tables and Node2 has
subscribed for the same.
- Node2 is publishing all changes for set2 tables and Node2 has
subscribed for the same.
Note: In all the tests, subscriptions are created with (origin=NONE)
as it is a bi-directional replication.
Workload Run:
---------------
- On node1, pgbench(read-write) with option "-b simple-update" is run
on set1 tables.
- On node2, pgbench(read-write) with option "-b simple-update" is run
on set2 tables.
- #clients = 40
- pgbench run duration = 10 minutes.
- results were measured for 3 runs of each case.
Test Runs:
- Six tests were done with varying #pub-sub pairs and below is TPS
reduction in both nodes for all the cases:
| Case | # Pub-Sub Pairs | TPS Reduction |
| ---- | --------------- | -------------- |
| 01 | 30 | 0–1% |
| 02 | 15 | 6–7% |
| 03 | 5 | 7–8% |
| 04 | 3 | 0-1% |
| 05 | 2 | 14–15% |
| 06 | 1 (no filters) | 37–40% |
- With appropriate row filters and distribution of load across apply
workers, the performance impact of update_deleted patch can be
minimized.
- Just 3 pub-sub pairs are enough to keep TPS close to the baseline
for the given workload.
- Poor distribution of replication workload (e.g., only 1–2 pub-sub
pairs) leads to higher overhead due to increased apply worker
contention.
~~~~
Detailed results for all the above cases:
case-01:
---------
- Created 30 pub-sub pairs to distribute the replication load between
30 apply workers on each node.
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 5633.377165 5579.244492 6385.839585 6482.775975
2 5926.328644 5947.035275 6216.045707 6416.113723
3 5522.804663 5542.380108 6541.031535 6190.123097
median 5633.377165 5579.244492 6385.839585 6416.113723
regression -1% 0%
- No regression
~~~~
case-02:
---------
- #pub-sub pairs = 15
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 8207.708475 7584.288026 8854.017934 8204.301497
2 8120.979334 7404.735801 8719.451895 8169.697482
3 7877.859139 7536.762733 8542.896669 8177.853563
median 8120.979334 7536.762733 8719.451895 8177.853563
regression -7% -6%
- There was 6-7% TPS reduction on both nodes, which seems in acceptable range.
~~~
case-03:
---------
- #pub-sub pairs = 5
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 12325.90315 11664.7445 12997.47104 12324.025
2 12060.38753 11370.52775 12728.41287 12127.61208
3 12390.3677 11367.10255 13135.02558 12036.71502
median 12325.90315 11370.52775 12997.47104 12127.61208
regression -8% -7%
- There was 7-8% TPS reduction on both nodes, which seems in acceptable range.
~~~
case-04:
---------
- #pub-sub pairs = 3
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 13186.22898 12464.42604 13973.8394 13370.45596
2 13038.15817 13014.03906 13866.51966 13866.47395
3 13881.10513 13868.71971 14687.67444 14516.33854
median 13186.22898 13014.03906 13973.8394 13866.47395
regression -1% -1%
- No regression observed
case-05:
---------
- #pub-sub pairs = 2
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 15936.98792 13563.98476 16734.35292 14527.22942
2 16031.23003 13648.24979 16958.49609 14657.80008
3 16113.79935 13550.68329 17029.5035 14509.84068
median 16031.23003 13563.98476 16958.49609 14527.22942
regression -15% -14%
- The TPS reduced by 14-15% on both nodes.
~~~
case-06:
---------
- #pub-sub pairs = 1 , no row filter is used on both nodes
Results:
#run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS
patched_Node2_TPS
1 22900.06507 13609.60639 23254.25113 14592.25271
2 22110.98426 13907.62583 22755.89945 14805.73717
3 22719.88901 13246.41484 23055.70406 14256.54223
median 22719.88901 13609.60639 23055.70406 14592.25271
regression -40% -37%
- The regression observed is 37-40% on both nodes.
~~~~
Test-2: High concurrency
===========================
Highlight:
------------
Despite poor write distribution across servers and high concurrent
updates, distributing replication load across multiple apply workers
limited the TPS drop to just 15–18%.
Setup:
---------------
- 2 Nodes(node1 and node2) are created with same configuration as in Test-01
- Both nodes have same set of pgbench tables initialized with
scale=60 (small tables to increase concurrent updates)
- Both nodes are subscribed to each other for all the changes.
-- 15 pub-sub pairs are created using row filters to distribute the
load and all the subscriptions are created with (origin = NONE).
Workload Run:
---------------
- On both nodes,the default pgbench(read-write) is run on tables.
- #clients = 15
- pgbench run duration = 5 minutes.
- results were measured for 2 runs of each case.
Results:
Node1 TPS:
#run pgHead_Node1_TPS patched_Node1_TPS
1 9585.470749 7660.645249
2 9442.364918 8035.531482
median 9513.917834 7848.088366
regression -18%
Node2 TPS:
#run pgHead_Node2_TPS patched_Node2_TPS
1 9485.232611 8248.783417
2 9468.894086 7938.991136
median 9477.063349 8093.887277
regression -15%
- Under high concurrent writes to the same small tables, contention
increases and the TPS drop is 15-18% on both nodes.
~~~~
The scripts used for above tests are attached.
--
Thanks,
Nisha
Attachment | Content-Type | Size |
---|---|---|
bi_dir_test_scripts.zip | application/x-zip-compressed | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2025-07-25 12:00:38 | Re: Memory consumed by paths during partitionwise join planning |
Previous Message | Zhijie Hou (Fujitsu) | 2025-07-25 11:08:46 | RE: Conflict detection for update_deleted in logical replication |