From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel Apply |
Date: | 2025-08-18 06:56:35 |
Message-ID: | CABdArM4gv08OWF5Gxndf8cVgO3MVeU9T8z47sZR=rUfL1N9bqw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Here is the initial POC patch for this idea.
>
Thank you Hou-san for the patch.
I did some performance benchmarking for the patch and overall, the
results show substantial performance improvements.
Please find the details as follows:
Source code:
----------------
pgHead (572c0f1b0e) and v1-0001 patch
Setup:
---------
Pub --> Sub
- Two nodes created in pub-sub logical replication setup.
- Both nodes have the same set of pgbench tables created with scale=300.
- The sub node is subscribed to all the changes from the pub node's
pgbench tables.
Workload Run:
--------------------
- Disable the subscription on Sub node
- Run default pgbench(read-write) only on Pub node with #clients=40
and run duration=10 minutes
- Enable the subscription on Sub once pgbench completes and then
measure time taken in replication.
~~~
Test-01: Measure Replication lag
----------------------------------------
Observations:
---------------
- Replication time improved as the number of parallel workers
increased with the patch.
- On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
- With just 2 parallel workers (default), replication time was cut in
half, and with 8 workers it completed in ~13 minutes(3.5x faster).
- With 16 parallel workers, achieved ~3.7x speedup over pgHead.
- With 32 workers, performance gains plateaued slightly, likely due
to more workers running on the machine and work done parallelly is not
that high to see further improvements.
Detailed Result:
-----------------
Case Time_taken_in_replication(sec) rep_time_in_minutes
faster_than_head
1. pgHead 2760.791 46.01318333 -
2. patched_#worker=2 1463.853 24.3975 1.88 times
3. patched_#worker=4 1031.376 17.1896 2.68 times
4. patched_#worker=8 781.007 13.0168 3.54 times
5. patched_#worker=16 741.108 12.3518 3.73 times
6. patched_#worker=32 787.203 13.1201 3.51 times
~~~~
Test-02: Measure number of transactions parallelized
-----------------------------------------------------
- Used a top up patch to LOG the number of transactions applied by
parallel worker, applied by leader, and are depended.
- The LOG output e.g. -
```
LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
```
- parallelized_nxact: gives the number of parallelized transactions
- dependent_nxact: gives the dependent transactions
- leader_applied_nxact: gives the transactions applied by leader worker
(the required top-up v1-002 patch is attached.)
Observations:
----------------
- With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
- As the number of workers increased, the parallelized percentage
increased and reached 99.99% with 32 workers.
Detailed Result:
-----------------
case1: #parallel_workers = 2(default)
#total_pgbench_txns = 24745648
parallelized_nxact = 14439480 (58.35%)
dependent_nxact = 16 (0.00006%)
leader_applied_nxact = 10306153 (41.64%)
case2: #parallel_workers = 4
#total_pgbench_txns = 24776108
parallelized_nxact = 19666593 (79.37%)
dependent_nxact = 212 (0.0008%)
leader_applied_nxact = 5109304 (20.62%)
case3: #parallel_workers = 8
#total_pgbench_txns = 24821333
parallelized_nxact = 24397431 (98.29%)
dependent_nxact = 282 (0.001%)
leader_applied_nxact = 423621 (1.71%)
case4: #parallel_workers = 16
#total_pgbench_txns = 24938255
parallelized_nxact = 24937754 (99.99%)
dependent_nxact = 142 (0.0005%)
leader_applied_nxact = 360 (0.0014%)
case5: #parallel_workers = 32
#total_pgbench_txns = 24769474
parallelized_nxact = 24769135 (99.99%)
dependent_nxact = 312 (0.0013%)
leader_applied_nxact = 28 (0.0001%)
~~~~~
The scripts used for above tests are attached.
Next, I plan to extend the testing to larger workloads by running
pgbench for 20–30 minutes.
We will also benchmark performance across different workload types to
evaluate the improvements once the patch has matured further.
--
Thanks,
Nisha
Attachment | Content-Type | Size |
---|---|---|
v1-0002-Add-some-simple-statistics.txt | text/plain | 2.4 KB |
v1_pa_pub-sub_setup.sh | text/x-sh | 2.0 KB |
v1_pa_pub-sub_measure.sh | text/x-sh | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2025-08-18 06:58:23 | Re: max_locks_per_transaction v18 |
Previous Message | Amit Kapila | 2025-08-18 06:55:05 | Re: Proposal: Conflict log history table for Logical Replication |