Re: Parallel Apply

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Parallel Apply
Date: 2026-04-24 04:21:48
Message-ID: CAJpy0uCRMkH7vBwqh83yMoLsShaDEttP+bWYVvM9=HykiXOrsg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 23, 2026 at 2:37 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, April 23, 2026 2:32 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Thu, Apr 23, 2026 at 7:31 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> > wrote:
> > >
> > > On Wed, Apr 22, 2026 at 7:23 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> > wrote:
> > > >
> > > ...
> > > > Regarding 0001, I did not understand the need of having 2 separate
> > messages:
> > > >
> > > > +#define PARALLEL_APPLY_INTERNAL_MESSAGE 'i'
> > > > + LOGICAL_REP_MSG_INTERNAL_MESSAGE = 'i',
> > > >
> > > > And the need of sending both together in 0003:
> > > >
> > > > +send_internal_dependencies(ParallelApplyWorkerInfo *winfo, List
> > > > *depends_on_xids)
> > > > +{
> > > > + pq_sendbyte(&dependencies, PARALLEL_APPLY_INTERNAL_MESSAGE);
> > > > + pq_sendbyte(&dependencies,
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE);
> > > >
> > > >
> > > > Also, it is confusing that above 2 are 'i' and
> > > > WORKER_INTERNAL_MSG_RELATION is also 'i'. Code has become very
> > tricky
> > > > to understand now.
> > > >
> > > > Reviewing everything, I feel having 'i' outside of LogicalRepMsgType
> > > > was better. I think it will eb better to retain
> > > > PARALLEL_APPLY_INTERNAL_MESSAGE and getting rid of
> > > > LOGICAL_REP_MSG_INTERNAL_MESSAGE. And when any worker
> > intercepts
> > > > PARALLEL_APPLY_INTERNAL_MESSAGE, it need not dispatch
> > > > (apply_dispatch), instead it can handle it using
> > > > apply_handle_internal_message()
> > > >
> > > > Goign above way:
> > > > --Messaged received from pub can be handled using apply_dispatch.
> > > > --Messages generated from leader to be handled separately/internally
> > > > using apply_handle_internal_message().
> > > >
> > > > That way we have clear-cut boundary between the two types and less
> > confusion.
> > >
> > > Hi Shveta,
> > >
> > > IIUC these need to be separate because they are used in 2 completly
> > > different ways:
> > >
> > > 1. In LogicalParallelApplyLoop the code need to identify as different
> > > from PqReplMsg_WALData
> > > 2. In apply_dispach() the message is delegated elsewhere according to
> > > the type LogicalRepMsgType
> > >
> > > PSA a pictue I made for my understanding of the current v15-0001
> > > design. It might help to visualize the message format more easily.
> > >
> > > While your suggestion looks good for LogicalParallelApplyLoop, I think
> > > the real problem is going to be in the apply_spooled_mesages() which
> > > wants call the apply_dispatch() directly. That won't be possible if
> > > LOGICAL_REP_MSG_INTERNAL_MESSAGE is removed. And, you cannot call
> > > directly to apply_handle_internal_message() withint knowing it is a
> > > PARALLEL_APPLY_INTERNAL_MESSAGE message, but that means first read
> > it
> > > pq_getmsgbyte(s). Then, you also need some hacky way to "unread" that
> > > byte in case it was not the PARALLEL_APPLY_INTERNAL_MESSAGE byte, but
> > > something different. AFAIK that was exactly what the previous
> > > v14-0001 code was doing with the is_worker_internal_message()
> > > function. I also think v15-0001 is a bit confusing, but v14-0001 was
> > > even more so.
> > >
> > > If there was some new function like `pq_peekmsgbyte(s)` which could
> > > simply "peek" the message byte value without advancing the cursor.
> > > Then, I apply_spooled_mesages() can just peek to find
> > > PARALLEL_APPLY_INTERNAL_MESSAGE and your suggested simplification
> > > could work. But it would *still* be complicated by the fact that you
> > > would have to ensure that PARALLEL_APPLY_INTERNAL_MESSAGE could
> > not
> > > clash with any of the LogicalRepMsgType! In the end, just keeping the
> > > LOGICAL_REP_MSG_INTERNAL_MESSAGE like v14 does may be the best
> > way to
> > > ensure that uniqueness...
> >
> > Okay. I see your point. Thanks for explaning.
> >
> > Another approach could be the one shown in the attached patch. In this
> > approach:
> >
> > a) We avoid pre-reading the message and then rewinding the cursor,
> > unlike the approach used in apply_spooled_messages() in v14.
> > b) We keep a single LOGICAL_REP_MSG_INTERNAL_MESSAGE for internal
> > messages; a separate PARALLEL_APPLY_INTERNAL_MESSAGE wrapper is not
> > required.
> > c) The caller decides whether to let apply_dispatch read the next
> > message or to act on an already pre-read message. This makes the
> > design more flexible if we need to handle additional pre-read internal
> > messages in the future, without introducing new wrapper message
> > formats.
> > d) The logic for dispatching actions on all message types remains
> > encapsulated within apply_dispatch.
>
> I think the first thing we need to decide is the message format sent to the
> parallel worker versus the format used for spooled messages.
>
> Option 1 (Current approach):
> Message to parallel worker:
> PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
> LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> WorkerInternalMsgType + data
> Spooled message:
> LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> WorkerInternalMsgType + data
>
> Option 2 (Alternative):
> Message to parallel worker:
> LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> WorkerInternalMsgType + data
> Spooled message:
> LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> WorkerInternalMsgType + data
>
> Option 3 (Alternative):
> Message to parallel worker:
> PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
> WorkerInternalMsgType + data
> Spooled message:
> WorkerInternalMsgType + data
>
> In Option 1, the extra PARALLEL_APPLY_INTERNAL_MESSAGE byte allows the parallel
> worker to distinguish internal messages from logical replication messages
> (which begin with PqReplMsg_WALData). Here, LOGICAL_REP_MSG_INTERNAL_MESSAGE
> serves purely as an apply action.
>
> Option 2 also works. The only minor issue is that LOGICAL_REP_MSG_INTERNAL_MESSAGE
> serves two purposes: (1) distinguishing from PqReplMsg_WALData in the parallel
> worker, and (2) acting as an apply action in apply_spooled_messages(). I don't
> think this is a big issue, so I'm not strongly opposed to it.
>
> Option 3 is what the V12 patch implements. It is the simplest approach,
> although it requires adding WorkerInternalMsgType values directly into
> LogicalRepMsgType, which has been commented previously.
>
> ----
>
> The second question is how to implement it.
>
> - Option 1: Used in the latest patch (we can improve it to use distinct byte values for
> PARALLEL_APPLY_INTERNAL_MESSAGE and LOGICAL_REP_MSG_INTERNAL_MESSAGE for clarity).
>
> - Option 2
>
> If we want to reuse LOGICAL_REP_MSG_INTERNAL_MESSAGE for both purposes, we could
> directly call apply_handle_internal_message in the parallel worker like this (We
> might need to set apply_error_callback_arg.command for this calling manually, so
> that the errcontext can work):
>
> if (c == PqReplMsg_WALData)
> {
> ...
> apply_dispatch(&s);
> }
> else if (c == LOGICAL_REP_MSG_INTERNAL_MESSAGE)
> {
> /* Handle the internal message. */
> apply_handle_internal_message(&s);
> }
>
> Shveta's patch does something similar but adds an extra parameter to
> apply_dispatch to control whether the function reads the first byte or uses a
> passed-in byte. I'm not sure if changing the interface is worth it, as it seems
> to complicate apply_dispatch() unnecessarily.
>
> - Option 3: Used in the older V12 patch.
>
> At the code level, I personally prefer Option 3, but I understand the reluctance
> to add internal values to LogicalRepMsgType. So, I'm not sure any of the
> proposed alternatives are clearly better.
>

Thank You Hou-San for summarizing all the options here.

I think Option 3 makes the implementation simpler, but I don’t think
it’s a good idea to include internal messages (INTERNAL_DEPENDENCY,
INTERNAL_RELATION) in LogicalRepMsgType. LogicalRepMsgType appears to
represent the external message format used between publisher and
subscriber, not internal subscriber messages. If we need to introduce
additional internal messages later (for leader-PA worker communication
or for other purpose), we would have to extend it again. Instead, it
would be better either to avoid modifying LogicalRepMsgType for
subscriber internal messages, or to introduce a broader umbrella
category like LOGICAL_REP_MSG_INTERNAL_MESSAGE.

Now, coming to Option 1:
It uses different communication protocols for PA worker and spooled
messages, which means separate processing would be required for
sending and reading them. I still believe both should follow the same
internal communication protocol, either both should include a format
byte, or neither should. I personally find its implementation harder
to follow.

Option 2 seems like the better approach, IMo at-least, because it
introduces an umbrella category (LOGICAL_REP_MSG_INTERNAL_MESSAGE) for
internal messages. If we need to extend internal messaging in the
future for other purposes, we wouldn’t have to modify
LogicalRepMsgType again; instead, we could extend
WorkerInternalMsgType. This keeps the design cleaner and more
maintainable.

Regarding these minor issues:
---------
> The only minor issue is that LOGICAL_REP_MSG_INTERNAL_MESSAGE
> serves two purposes: (1) distinguishing from PqReplMsg_WALData in the parallel
> worker, and (2) acting as an apply action in apply_spooled_messages(). I don't
> think this is a big issue, so I'm not strongly opposed to it.
--------

I don't really see them as problems. In both cases,
LOGICAL_REP_MSG_INTERNAL_MESSAGE effectively represents an action
type, so using it in these contexts feels consistent. I also think its
reasonable not to have an external format byte for internal messages
and instead treat them purely as actions within both
LogicalParallelApplyLoop() and apply_spooled_messages().

Now, regarding the implementation of Option 2:

I'm fine with the current approach, where apply_dispatch() handles
LOGICAL_REP_MSG_INTERNAL_MESSAGE for the apply_spooled_messages()
case, while LogicalParallelApplyLoop() directly calls
apply_handle_internal_message() instead of going through
apply_dispatch().

That said, it would feel more consistent if both
LogicalParallelApplyLoop() and apply_spooled_messages() followed the
same path, either both calling apply_dispatch() or both calling
apply_handle_internal_message(). Since
LOGICAL_REP_MSG_INTERNAL_MESSAGE is part of LogicalRepMsgType, a
cleaner approach would be to let apply_dispatch() handle the
invocation of apply_handle_internal_message(), and have all callers
route through apply_dispatch() for uniformity.

FYI, I’m not strictly against any of the above approaches; they all
achieve the goal. I’m just sharing my preferences.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-04-24 04:26:54 Re: [Patch] Block ALTER TABLE RENAME COLUMN when column is used by property graph
Previous Message wenhui qiu 2026-04-24 03:57:40 Re: New vacuum config to avoid anti wraparound vacuums