RE: Parallel Apply

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Parallel Apply
Date: 2026-04-23 09:07:22
Message-ID: TYRPR01MB14195CF528AD5AE1450A9B824942A2@TYRPR01MB14195.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, April 23, 2026 2:32 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2026 at 7:31 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Apr 22, 2026 at 7:23 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> > >
> > ...
> > > Regarding 0001, I did not understand the need of having 2 separate
> messages:
> > >
> > > +#define PARALLEL_APPLY_INTERNAL_MESSAGE 'i'
> > > + LOGICAL_REP_MSG_INTERNAL_MESSAGE = 'i',
> > >
> > > And the need of sending both together in 0003:
> > >
> > > +send_internal_dependencies(ParallelApplyWorkerInfo *winfo, List
> > > *depends_on_xids)
> > > +{
> > > + pq_sendbyte(&dependencies, PARALLEL_APPLY_INTERNAL_MESSAGE);
> > > + pq_sendbyte(&dependencies,
> LOGICAL_REP_MSG_INTERNAL_MESSAGE);
> > >
> > >
> > > Also, it is confusing that above 2 are 'i' and
> > > WORKER_INTERNAL_MSG_RELATION is also 'i'. Code has become very
> tricky
> > > to understand now.
> > >
> > > Reviewing everything, I feel having 'i' outside of LogicalRepMsgType
> > > was better. I think it will eb better to retain
> > > PARALLEL_APPLY_INTERNAL_MESSAGE and getting rid of
> > > LOGICAL_REP_MSG_INTERNAL_MESSAGE. And when any worker
> intercepts
> > > PARALLEL_APPLY_INTERNAL_MESSAGE, it need not dispatch
> > > (apply_dispatch), instead it can handle it using
> > > apply_handle_internal_message()
> > >
> > > Goign above way:
> > > --Messaged received from pub can be handled using apply_dispatch.
> > > --Messages generated from leader to be handled separately/internally
> > > using apply_handle_internal_message().
> > >
> > > That way we have clear-cut boundary between the two types and less
> confusion.
> >
> > Hi Shveta,
> >
> > IIUC these need to be separate because they are used in 2 completly
> > different ways:
> >
> > 1. In LogicalParallelApplyLoop the code need to identify as different
> > from PqReplMsg_WALData
> > 2. In apply_dispach() the message is delegated elsewhere according to
> > the type LogicalRepMsgType
> >
> > PSA a pictue I made for my understanding of the current v15-0001
> > design. It might help to visualize the message format more easily.
> >
> > While your suggestion looks good for LogicalParallelApplyLoop, I think
> > the real problem is going to be in the apply_spooled_mesages() which
> > wants call the apply_dispatch() directly. That won't be possible if
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE is removed. And, you cannot call
> > directly to apply_handle_internal_message() withint knowing it is a
> > PARALLEL_APPLY_INTERNAL_MESSAGE message, but that means first read
> it
> > pq_getmsgbyte(s). Then, you also need some hacky way to "unread" that
> > byte in case it was not the PARALLEL_APPLY_INTERNAL_MESSAGE byte, but
> > something different. AFAIK that was exactly what the previous
> > v14-0001 code was doing with the is_worker_internal_message()
> > function. I also think v15-0001 is a bit confusing, but v14-0001 was
> > even more so.
> >
> > If there was some new function like `pq_peekmsgbyte(s)` which could
> > simply "peek" the message byte value without advancing the cursor.
> > Then, I apply_spooled_mesages() can just peek to find
> > PARALLEL_APPLY_INTERNAL_MESSAGE and your suggested simplification
> > could work. But it would *still* be complicated by the fact that you
> > would have to ensure that PARALLEL_APPLY_INTERNAL_MESSAGE could
> not
> > clash with any of the LogicalRepMsgType! In the end, just keeping the
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE like v14 does may be the best
> way to
> > ensure that uniqueness...
>
> Okay. I see your point. Thanks for explaning.
>
> Another approach could be the one shown in the attached patch. In this
> approach:
>
> a) We avoid pre-reading the message and then rewinding the cursor,
> unlike the approach used in apply_spooled_messages() in v14.
> b) We keep a single LOGICAL_REP_MSG_INTERNAL_MESSAGE for internal
> messages; a separate PARALLEL_APPLY_INTERNAL_MESSAGE wrapper is not
> required.
> c) The caller decides whether to let apply_dispatch read the next
> message or to act on an already pre-read message. This makes the
> design more flexible if we need to handle additional pre-read internal
> messages in the future, without introducing new wrapper message
> formats.
> d) The logic for dispatching actions on all message types remains
> encapsulated within apply_dispatch.

I think the first thing we need to decide is the message format sent to the
parallel worker versus the format used for spooled messages.

Option 1 (Current approach):
Message to parallel worker:
PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
WorkerInternalMsgType + data
Spooled message:
LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
WorkerInternalMsgType + data

Option 2 (Alternative):
Message to parallel worker:
LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
WorkerInternalMsgType + data
Spooled message:
LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
WorkerInternalMsgType + data

Option 3 (Alternative):
Message to parallel worker:
PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
WorkerInternalMsgType + data
Spooled message:
WorkerInternalMsgType + data

In Option 1, the extra PARALLEL_APPLY_INTERNAL_MESSAGE byte allows the parallel
worker to distinguish internal messages from logical replication messages
(which begin with PqReplMsg_WALData). Here, LOGICAL_REP_MSG_INTERNAL_MESSAGE
serves purely as an apply action.

Option 2 also works. The only minor issue is that LOGICAL_REP_MSG_INTERNAL_MESSAGE
serves two purposes: (1) distinguishing from PqReplMsg_WALData in the parallel
worker, and (2) acting as an apply action in apply_spooled_messages(). I don't
think this is a big issue, so I'm not strongly opposed to it.

Option 3 is what the V12 patch implements. It is the simplest approach,
although it requires adding WorkerInternalMsgType values directly into
LogicalRepMsgType, which has been commented previously.

----

The second question is how to implement it.

- Option 1: Used in the latest patch (we can improve it to use distinct byte values for
PARALLEL_APPLY_INTERNAL_MESSAGE and LOGICAL_REP_MSG_INTERNAL_MESSAGE for clarity).

- Option 2

If we want to reuse LOGICAL_REP_MSG_INTERNAL_MESSAGE for both purposes, we could
directly call apply_handle_internal_message in the parallel worker like this (We
might need to set apply_error_callback_arg.command for this calling manually, so
that the errcontext can work):

if (c == PqReplMsg_WALData)
{
...
apply_dispatch(&s);
}
else if (c == LOGICAL_REP_MSG_INTERNAL_MESSAGE)
{
/* Handle the internal message. */
apply_handle_internal_message(&s);
}

Shveta's patch does something similar but adds an extra parameter to
apply_dispatch to control whether the function reads the first byte or uses a
passed-in byte. I'm not sure if changing the interface is worth it, as it seems
to complicate apply_dispatch() unnecessarily.

- Option 3: Used in the older V12 patch.

At the code level, I personally prefer Option 3, but I understand the reluctance
to add internal values to LogicalRepMsgType. So, I'm not sure any of the
proposed alternatives are clearly better.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-04-23 09:25:06 Re: Parallel Apply
Previous Message Amit Kapila 2026-04-23 09:00:08 Re: EXCEPT TABLE - Case inconsistency for describe \d and \dRp+