Re: Parallel Apply

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Apply
Date: 2025-09-06 05:03:30
Message-ID: CAFiTN-tjm7hC83bmh0X75+eK3e72E5MbykG=579ae9kqBPn36A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:

>
> Here is the initial POC patch for this idea.
>
> The basic implementation is outlined below. Please note that there are several
> TODO items remaining, which we are actively working on; these are also detailed
> further down.

Thanks for the patch.

> Each parallel apply worker records the local end LSN of the transaction it
> applies in shared memory. Subsequently, the leader gathers these local end LSNs
> and logs them in the local 'lsn_mapping' for verifying whether they have been
> flushed to disk (following the logic in get_flush_position()).
>
> If no parallel apply worker is available, the leader will apply the transaction
> independently.

I suspect this might not be the most performant default strategy and
could frequently cause a performance dip. In general, we utilize
parallel apply workers, considering that the time taken to apply
changes is much costlier than reading and sending messages to workers.

The current strategy involves the leader picking one transaction for
itself after distributing transactions to all apply workers, assuming
the apply task will take some time to complete. When the leader takes
on an apply task, it becomes a bottleneck for complete parallelism.
This is because it needs to finish applying previous messages before
accepting any new ones. Consequently, even as workers slowly become
free, they won't receive new tasks because the leader is busy applying
its own transaction.

This type of strategy might be suitable in scenarios where users
cannot supply more workers due to resource limitations. However, on
high-end machines, it is more efficient to let the leader act solely
as a message transmitter and allow the apply workers to handle all
apply tasks. This could be a configurable parameter, determining
whether the leader also participates in applying changes. I believe
this should not be the default strategy; in fact, the default should
be for the leader to act purely as a transmitter.

--
Regards,
Dilip Kumar
Google

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-09-06 05:09:48 Re: [Patch] add new parameter to pg_replication_origin_session_setup
Previous Message Amit Kapila 2025-09-06 04:32:31 Re: Allow logical replication in the same cluster