Re: Parallel Apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Apply
Date: 2025-08-19 03:07:55
Message-ID: CAA4eK1JkZ1JNQ71eO9+0QwSncLNFQv_KauYERNzxRhNUGcYDTA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 18, 2025 at 8:20 PM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
> On 18/08/2025 9:56 AM, Nisha Moond wrote:
> > On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >> Here is the initial POC patch for this idea.
> >>
> > Thank you Hou-san for the patch.
> >
> > I did some performance benchmarking for the patch and overall, the
> > results show substantial performance improvements.
> > Please find the details as follows:
> >
> > Source code:
> > ----------------
> > pgHead (572c0f1b0e) and v1-0001 patch
> >
> > Setup:
> > ---------
> > Pub --> Sub
> > - Two nodes created in pub-sub logical replication setup.
> > - Both nodes have the same set of pgbench tables created with scale=300.
> > - The sub node is subscribed to all the changes from the pub node's
> > pgbench tables.
> >
> > Workload Run:
> > --------------------
> > - Disable the subscription on Sub node
> > - Run default pgbench(read-write) only on Pub node with #clients=40
> > and run duration=10 minutes
> > - Enable the subscription on Sub once pgbench completes and then
> > measure time taken in replication.
> > ~~~
> >
> > Test-01: Measure Replication lag
> > ----------------------------------------
> > Observations:
> > ---------------
> > - Replication time improved as the number of parallel workers
> > increased with the patch.
> > - On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
> > - With just 2 parallel workers (default), replication time was cut in
> > half, and with 8 workers it completed in ~13 minutes(3.5x faster).
> > - With 16 parallel workers, achieved ~3.7x speedup over pgHead.
> > - With 32 workers, performance gains plateaued slightly, likely due
> > to more workers running on the machine and work done parallelly is not
> > that high to see further improvements.
> >
> > Detailed Result:
> > -----------------
> > Case Time_taken_in_replication(sec) rep_time_in_minutes
> > faster_than_head
> > 1. pgHead 2760.791 46.01318333 -
> > 2. patched_#worker=2 1463.853 24.3975 1.88 times
> > 3. patched_#worker=4 1031.376 17.1896 2.68 times
> > 4. patched_#worker=8 781.007 13.0168 3.54 times
> > 5. patched_#worker=16 741.108 12.3518 3.73 times
> > 6. patched_#worker=32 787.203 13.1201 3.51 times
> > ~~~~
> >
> > Test-02: Measure number of transactions parallelized
> > -----------------------------------------------------
> > - Used a top up patch to LOG the number of transactions applied by
> > parallel worker, applied by leader, and are depended.
> > - The LOG output e.g. -
> > ```
> > LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
> > ```
> > - parallelized_nxact: gives the number of parallelized transactions
> > - dependent_nxact: gives the dependent transactions
> > - leader_applied_nxact: gives the transactions applied by leader worker
> > (the required top-up v1-002 patch is attached.)
> >
> > Observations:
> > ----------------
> > - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
> > - As the number of workers increased, the parallelized percentage
> > increased and reached 99.99% with 32 workers.
> >
> > Detailed Result:
> > -----------------
> > case1: #parallel_workers = 2(default)
> > #total_pgbench_txns = 24745648
> > parallelized_nxact = 14439480 (58.35%)
> > dependent_nxact = 16 (0.00006%)
> > leader_applied_nxact = 10306153 (41.64%)
> >
> > case2: #parallel_workers = 4
> > #total_pgbench_txns = 24776108
> > parallelized_nxact = 19666593 (79.37%)
> > dependent_nxact = 212 (0.0008%)
> > leader_applied_nxact = 5109304 (20.62%)
> >
> > case3: #parallel_workers = 8
> > #total_pgbench_txns = 24821333
> > parallelized_nxact = 24397431 (98.29%)
> > dependent_nxact = 282 (0.001%)
> > leader_applied_nxact = 423621 (1.71%)
> >
> > case4: #parallel_workers = 16
> > #total_pgbench_txns = 24938255
> > parallelized_nxact = 24937754 (99.99%)
> > dependent_nxact = 142 (0.0005%)
> > leader_applied_nxact = 360 (0.0014%)
> >
> > case5: #parallel_workers = 32
> > #total_pgbench_txns = 24769474
> > parallelized_nxact = 24769135 (99.99%)
> > dependent_nxact = 312 (0.0013%)
> > leader_applied_nxact = 28 (0.0001%)
> >
> > ~~~~~
> > The scripts used for above tests are attached.
> >
> > Next, I plan to extend the testing to larger workloads by running
> > pgbench for 20–30 minutes.
> > We will also benchmark performance across different workload types to
> > evaluate the improvements once the patch has matured further.
> >
> > --
> > Thanks,
> > Nisha
>
>
> I also did some benchmarking of the proposed parallel apply patch and
> compare it with my prewarming approach.
> And parallel apply is significantly more efficient than prefetch (it is
> expected).
>

Thanks to you and Nisha for doing some preliminary performance
testing, the results are really encouraging (more than 3 to 4 times
improvement in multiple workloads). I hope we keep making progress on
this patch and make it ready for the next release.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-08-19 03:36:49 Re: CREATE SCHEMA ... CREATE DOMAIN support
Previous Message Richard Guo 2025-08-19 03:05:19 Re: Pathify RHS unique-ification for semijoin planning