From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel Apply |
Date: | 2025-08-18 14:49:56 |
Message-ID: | ae5c5a41-2f68-4088-8fcc-58ed71a7f82f@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18/08/2025 9:56 AM, Nisha Moond wrote:
> On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>> Here is the initial POC patch for this idea.
>>
> Thank you Hou-san for the patch.
>
> I did some performance benchmarking for the patch and overall, the
> results show substantial performance improvements.
> Please find the details as follows:
>
> Source code:
> ----------------
> pgHead (572c0f1b0e) and v1-0001 patch
>
> Setup:
> ---------
> Pub --> Sub
> - Two nodes created in pub-sub logical replication setup.
> - Both nodes have the same set of pgbench tables created with scale=300.
> - The sub node is subscribed to all the changes from the pub node's
> pgbench tables.
>
> Workload Run:
> --------------------
> - Disable the subscription on Sub node
> - Run default pgbench(read-write) only on Pub node with #clients=40
> and run duration=10 minutes
> - Enable the subscription on Sub once pgbench completes and then
> measure time taken in replication.
> ~~~
>
> Test-01: Measure Replication lag
> ----------------------------------------
> Observations:
> ---------------
> - Replication time improved as the number of parallel workers
> increased with the patch.
> - On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
> - With just 2 parallel workers (default), replication time was cut in
> half, and with 8 workers it completed in ~13 minutes(3.5x faster).
> - With 16 parallel workers, achieved ~3.7x speedup over pgHead.
> - With 32 workers, performance gains plateaued slightly, likely due
> to more workers running on the machine and work done parallelly is not
> that high to see further improvements.
>
> Detailed Result:
> -----------------
> Case Time_taken_in_replication(sec) rep_time_in_minutes
> faster_than_head
> 1. pgHead 2760.791 46.01318333 -
> 2. patched_#worker=2 1463.853 24.3975 1.88 times
> 3. patched_#worker=4 1031.376 17.1896 2.68 times
> 4. patched_#worker=8 781.007 13.0168 3.54 times
> 5. patched_#worker=16 741.108 12.3518 3.73 times
> 6. patched_#worker=32 787.203 13.1201 3.51 times
> ~~~~
>
> Test-02: Measure number of transactions parallelized
> -----------------------------------------------------
> - Used a top up patch to LOG the number of transactions applied by
> parallel worker, applied by leader, and are depended.
> - The LOG output e.g. -
> ```
> LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
> ```
> - parallelized_nxact: gives the number of parallelized transactions
> - dependent_nxact: gives the dependent transactions
> - leader_applied_nxact: gives the transactions applied by leader worker
> (the required top-up v1-002 patch is attached.)
>
> Observations:
> ----------------
> - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
> - As the number of workers increased, the parallelized percentage
> increased and reached 99.99% with 32 workers.
>
> Detailed Result:
> -----------------
> case1: #parallel_workers = 2(default)
> #total_pgbench_txns = 24745648
> parallelized_nxact = 14439480 (58.35%)
> dependent_nxact = 16 (0.00006%)
> leader_applied_nxact = 10306153 (41.64%)
>
> case2: #parallel_workers = 4
> #total_pgbench_txns = 24776108
> parallelized_nxact = 19666593 (79.37%)
> dependent_nxact = 212 (0.0008%)
> leader_applied_nxact = 5109304 (20.62%)
>
> case3: #parallel_workers = 8
> #total_pgbench_txns = 24821333
> parallelized_nxact = 24397431 (98.29%)
> dependent_nxact = 282 (0.001%)
> leader_applied_nxact = 423621 (1.71%)
>
> case4: #parallel_workers = 16
> #total_pgbench_txns = 24938255
> parallelized_nxact = 24937754 (99.99%)
> dependent_nxact = 142 (0.0005%)
> leader_applied_nxact = 360 (0.0014%)
>
> case5: #parallel_workers = 32
> #total_pgbench_txns = 24769474
> parallelized_nxact = 24769135 (99.99%)
> dependent_nxact = 312 (0.0013%)
> leader_applied_nxact = 28 (0.0001%)
>
> ~~~~~
> The scripts used for above tests are attached.
>
> Next, I plan to extend the testing to larger workloads by running
> pgbench for 20–30 minutes.
> We will also benchmark performance across different workload types to
> evaluate the improvements once the patch has matured further.
>
> --
> Thanks,
> Nisha
I also did some benchmarking of the proposed parallel apply patch and
compare it with my prewarming approach.
And parallel apply is significantly more efficient than prefetch (it is
expected).
So I had two tests (more details here):
One is performing random updates and another - inserts with random key.
I stop subscriber, apply workload at publisher during 100 seconds and
then measure how long time it will take subscriber to caught up.
update test (with 8 parallel apply workers):
master: 8:30 min
prefetch: 2:05 min
parallel apply: 1:30 min
insert test (with 8 parallel apply workers):
master: 9:20 min
prefetch: 3:08 min
parallel apply: 1:54 min
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2025-08-18 14:57:35 | Re: Retail DDL |
Previous Message | Isaac Morland | 2025-08-18 14:39:37 | Re: Retail DDL |