Re: Parallel Apply

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Apply
Date: 2025-08-18 14:49:56
Message-ID: ae5c5a41-2f68-4088-8fcc-58ed71a7f82f@garret.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 18/08/2025 9:56 AM, Nisha Moond wrote:
> On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>> Here is the initial POC patch for this idea.
>>
> Thank you Hou-san for the patch.
>
> I did some performance benchmarking for the patch and overall, the
> results show substantial performance improvements.
> Please find the details as follows:
>
> Source code:
> ----------------
> pgHead (572c0f1b0e) and v1-0001 patch
>
> Setup:
> ---------
> Pub --> Sub
> - Two nodes created in pub-sub logical replication setup.
> - Both nodes have the same set of pgbench tables created with scale=300.
> - The sub node is subscribed to all the changes from the pub node's
> pgbench tables.
>
> Workload Run:
> --------------------
> - Disable the subscription on Sub node
> - Run default pgbench(read-write) only on Pub node with #clients=40
> and run duration=10 minutes
> - Enable the subscription on Sub once pgbench completes and then
> measure time taken in replication.
> ~~~
>
> Test-01: Measure Replication lag
> ----------------------------------------
> Observations:
> ---------------
> - Replication time improved as the number of parallel workers
> increased with the patch.
> - On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
> - With just 2 parallel workers (default), replication time was cut in
> half, and with 8 workers it completed in ~13 minutes(3.5x faster).
> - With 16 parallel workers, achieved ~3.7x speedup over pgHead.
> - With 32 workers, performance gains plateaued slightly, likely due
> to more workers running on the machine and work done parallelly is not
> that high to see further improvements.
>
> Detailed Result:
> -----------------
> Case Time_taken_in_replication(sec) rep_time_in_minutes
> faster_than_head
> 1. pgHead 2760.791 46.01318333 -
> 2. patched_#worker=2 1463.853 24.3975 1.88 times
> 3. patched_#worker=4 1031.376 17.1896 2.68 times
> 4. patched_#worker=8 781.007 13.0168 3.54 times
> 5. patched_#worker=16 741.108 12.3518 3.73 times
> 6. patched_#worker=32 787.203 13.1201 3.51 times
> ~~~~
>
> Test-02: Measure number of transactions parallelized
> -----------------------------------------------------
> - Used a top up patch to LOG the number of transactions applied by
> parallel worker, applied by leader, and are depended.
> - The LOG output e.g. -
> ```
> LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
> ```
> - parallelized_nxact: gives the number of parallelized transactions
> - dependent_nxact: gives the dependent transactions
> - leader_applied_nxact: gives the transactions applied by leader worker
> (the required top-up v1-002 patch is attached.)
>
> Observations:
> ----------------
> - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
> - As the number of workers increased, the parallelized percentage
> increased and reached 99.99% with 32 workers.
>
> Detailed Result:
> -----------------
> case1: #parallel_workers = 2(default)
> #total_pgbench_txns = 24745648
> parallelized_nxact = 14439480 (58.35%)
> dependent_nxact = 16 (0.00006%)
> leader_applied_nxact = 10306153 (41.64%)
>
> case2: #parallel_workers = 4
> #total_pgbench_txns = 24776108
> parallelized_nxact = 19666593 (79.37%)
> dependent_nxact = 212 (0.0008%)
> leader_applied_nxact = 5109304 (20.62%)
>
> case3: #parallel_workers = 8
> #total_pgbench_txns = 24821333
> parallelized_nxact = 24397431 (98.29%)
> dependent_nxact = 282 (0.001%)
> leader_applied_nxact = 423621 (1.71%)
>
> case4: #parallel_workers = 16
> #total_pgbench_txns = 24938255
> parallelized_nxact = 24937754 (99.99%)
> dependent_nxact = 142 (0.0005%)
> leader_applied_nxact = 360 (0.0014%)
>
> case5: #parallel_workers = 32
> #total_pgbench_txns = 24769474
> parallelized_nxact = 24769135 (99.99%)
> dependent_nxact = 312 (0.0013%)
> leader_applied_nxact = 28 (0.0001%)
>
> ~~~~~
> The scripts used for above tests are attached.
>
> Next, I plan to extend the testing to larger workloads by running
> pgbench for 20–30 minutes.
> We will also benchmark performance across different workload types to
> evaluate the improvements once the patch has matured further.
>
> --
> Thanks,
> Nisha

I also did some benchmarking of the proposed parallel apply patch and
compare it with my prewarming approach.
And parallel apply is significantly more efficient than prefetch (it is
expected).

So I had two tests (more details here):

https://www.postgresql.org/message-id/flat/84ed36b8-7d06-4945-9a6b-3826b3f999a6%40garret.ru#70b45c44814c248d3d519a762f528753

One is performing random updates and another - inserts with random key.
I stop subscriber, apply workload at publisher during 100 seconds and
then measure how long time it will take subscriber to caught up.

update test (with 8 parallel apply workers):

    master:           8:30 min
    prefetch:         2:05 min
    parallel apply: 1:30 min

insert test (with 8 parallel apply workers):

    master:           9:20 min
    prefetch:         3:08 min
    parallel apply: 1:54 min

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-08-18 14:57:35 Re: Retail DDL
Previous Message Isaac Morland 2025-08-18 14:39:37 Re: Retail DDL