Re: Parallel Apply

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Apply
Date: 2025-11-24 04:26:38
Message-ID: CAFiTN-ut-W1-SvD=txQk0EUXv5RM5c1YdkfJEgZp78yPTZX8BQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 16, 2025 at 3:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Sep 6, 2025 at 10:33 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> > I suspect this might not be the most performant default strategy and
> > could frequently cause a performance dip. In general, we utilize
> > parallel apply workers, considering that the time taken to apply
> > changes is much costlier than reading and sending messages to workers.
> >
> > The current strategy involves the leader picking one transaction for
> > itself after distributing transactions to all apply workers, assuming
> > the apply task will take some time to complete. When the leader takes
> > on an apply task, it becomes a bottleneck for complete parallelism.
> > This is because it needs to finish applying previous messages before
> > accepting any new ones. Consequently, even as workers slowly become
> > free, they won't receive new tasks because the leader is busy applying
> > its own transaction.
> >
> > This type of strategy might be suitable in scenarios where users
> > cannot supply more workers due to resource limitations. However, on
> > high-end machines, it is more efficient to let the leader act solely
> > as a message transmitter and allow the apply workers to handle all
> > apply tasks. This could be a configurable parameter, determining
> > whether the leader also participates in applying changes. I believe
> > this should not be the default strategy; in fact, the default should
> > be for the leader to act purely as a transmitter.
> >
>
> I see your point but consider a scenario where we have two pa workers.
> pa-1 is waiting for some backend on unique_key insertion and pa-2 is
> waiting for pa-1 to complete its transaction as pa-2 has to perform
> some change which is dependent on pa-1's transaction. So, leader can
> either simply wait for a third transaction to be distributed or just
> apply it and process another change. If we follow the earlier then it
> is quite possible that the sender fills the network queue to send data
> and simply timed out.

Sorry I took a while to come back to this. I understand your point and
agree that it's a valid concern. However, I question whether limiting
this to a single choice is the optimal solution. The core issue
involves two distinct roles: work distribution and applying changes.
Work distribution is exclusively handled by the leader, while any
worker can apply the changes. This is essentially a single-producer,
multiple-consumer problem.

While it might seem efficient for the producer (leader) to assist
consumers (workers) when there's a limited number of consumers, I
believe this isn't the best design. In such scenarios, it's generally
better to allow the producer to focus solely on its primary task,
unless there's a severe shortage of processing power.

If computing resources are constrained, allowing producers to join
consumers in applying changes is acceptable. However, if sufficient
processing power is available, the producer should ideally be left to
its own duties. The question then becomes: how do we make this
decision?

My suggestion is to make this a configurable parameter. Users could
then decide whether the leader participates in applying changes. This
would provide flexibility: If there are enough workers, user can set
the leader can focus on its distribution task only OTOH If processing
power is limited and only a few apply workers (e.g., two, as in your
example) can be set up, users would have the option to configure the
leader to also act as an apply worker when needed.

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-11-24 04:43:17 Re: [Proposal] Adding callback support for custom statistics kinds
Previous Message David Rowley 2025-11-24 04:03:04 Re: Adjust comments for `IndexOptInfo` to accurately reflect indexcollations's length