| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Parallel Apply |
| Date: | 2026-04-24 11:08:29 |
| Message-ID: | CAA4eK1+NnsRjyHy-0Wkm8T9DeW3u5BZnOP+bQFXPSmVfHVxO1w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Apr 24, 2026 at 9:52 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2026 at 2:37 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Thursday, April 23, 2026 2:32 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Apr 23, 2026 at 7:31 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Wed, Apr 22, 2026 at 7:23 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> > > wrote:
> > > > >
> > > > ...
> > > > > Regarding 0001, I did not understand the need of having 2 separate
> > > messages:
> > > > >
> > > > > +#define PARALLEL_APPLY_INTERNAL_MESSAGE 'i'
> > > > > + LOGICAL_REP_MSG_INTERNAL_MESSAGE = 'i',
> > > > >
> > > > > And the need of sending both together in 0003:
> > > > >
> > > > > +send_internal_dependencies(ParallelApplyWorkerInfo *winfo, List
> > > > > *depends_on_xids)
> > > > > +{
> > > > > + pq_sendbyte(&dependencies, PARALLEL_APPLY_INTERNAL_MESSAGE);
> > > > > + pq_sendbyte(&dependencies,
> > > LOGICAL_REP_MSG_INTERNAL_MESSAGE);
> > > > >
> > > > >
> > > > > Also, it is confusing that above 2 are 'i' and
> > > > > WORKER_INTERNAL_MSG_RELATION is also 'i'. Code has become very
> > > tricky
> > > > > to understand now.
> > > > >
> > > > > Reviewing everything, I feel having 'i' outside of LogicalRepMsgType
> > > > > was better. I think it will eb better to retain
> > > > > PARALLEL_APPLY_INTERNAL_MESSAGE and getting rid of
> > > > > LOGICAL_REP_MSG_INTERNAL_MESSAGE. And when any worker
> > > intercepts
> > > > > PARALLEL_APPLY_INTERNAL_MESSAGE, it need not dispatch
> > > > > (apply_dispatch), instead it can handle it using
> > > > > apply_handle_internal_message()
> > > > >
> > > > > Goign above way:
> > > > > --Messaged received from pub can be handled using apply_dispatch.
> > > > > --Messages generated from leader to be handled separately/internally
> > > > > using apply_handle_internal_message().
> > > > >
> > > > > That way we have clear-cut boundary between the two types and less
> > > confusion.
> > > >
> > > > Hi Shveta,
> > > >
> > > > IIUC these need to be separate because they are used in 2 completly
> > > > different ways:
> > > >
> > > > 1. In LogicalParallelApplyLoop the code need to identify as different
> > > > from PqReplMsg_WALData
> > > > 2. In apply_dispach() the message is delegated elsewhere according to
> > > > the type LogicalRepMsgType
> > > >
> > > > PSA a pictue I made for my understanding of the current v15-0001
> > > > design. It might help to visualize the message format more easily.
> > > >
> > > > While your suggestion looks good for LogicalParallelApplyLoop, I think
> > > > the real problem is going to be in the apply_spooled_mesages() which
> > > > wants call the apply_dispatch() directly. That won't be possible if
> > > > LOGICAL_REP_MSG_INTERNAL_MESSAGE is removed. And, you cannot call
> > > > directly to apply_handle_internal_message() withint knowing it is a
> > > > PARALLEL_APPLY_INTERNAL_MESSAGE message, but that means first read
> > > it
> > > > pq_getmsgbyte(s). Then, you also need some hacky way to "unread" that
> > > > byte in case it was not the PARALLEL_APPLY_INTERNAL_MESSAGE byte, but
> > > > something different. AFAIK that was exactly what the previous
> > > > v14-0001 code was doing with the is_worker_internal_message()
> > > > function. I also think v15-0001 is a bit confusing, but v14-0001 was
> > > > even more so.
> > > >
> > > > If there was some new function like `pq_peekmsgbyte(s)` which could
> > > > simply "peek" the message byte value without advancing the cursor.
> > > > Then, I apply_spooled_mesages() can just peek to find
> > > > PARALLEL_APPLY_INTERNAL_MESSAGE and your suggested simplification
> > > > could work. But it would *still* be complicated by the fact that you
> > > > would have to ensure that PARALLEL_APPLY_INTERNAL_MESSAGE could
> > > not
> > > > clash with any of the LogicalRepMsgType! In the end, just keeping the
> > > > LOGICAL_REP_MSG_INTERNAL_MESSAGE like v14 does may be the best
> > > way to
> > > > ensure that uniqueness...
> > >
> > > Okay. I see your point. Thanks for explaning.
> > >
> > > Another approach could be the one shown in the attached patch. In this
> > > approach:
> > >
> > > a) We avoid pre-reading the message and then rewinding the cursor,
> > > unlike the approach used in apply_spooled_messages() in v14.
> > > b) We keep a single LOGICAL_REP_MSG_INTERNAL_MESSAGE for internal
> > > messages; a separate PARALLEL_APPLY_INTERNAL_MESSAGE wrapper is not
> > > required.
> > > c) The caller decides whether to let apply_dispatch read the next
> > > message or to act on an already pre-read message. This makes the
> > > design more flexible if we need to handle additional pre-read internal
> > > messages in the future, without introducing new wrapper message
> > > formats.
> > > d) The logic for dispatching actions on all message types remains
> > > encapsulated within apply_dispatch.
> >
> > I think the first thing we need to decide is the message format sent to the
> > parallel worker versus the format used for spooled messages.
> >
> > Option 1 (Current approach):
> > Message to parallel worker:
> > PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> > WorkerInternalMsgType + data
> > Spooled message:
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> > WorkerInternalMsgType + data
> >
> > Option 2 (Alternative):
> > Message to parallel worker:
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> > WorkerInternalMsgType + data
> > Spooled message:
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
> > WorkerInternalMsgType + data
> >
> > Option 3 (Alternative):
> > Message to parallel worker:
> > PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
> > WorkerInternalMsgType + data
> > Spooled message:
> > WorkerInternalMsgType + data
> >
>
> Thank You Hou-San for summarizing all the options here.
>
> I think Option 3 makes the implementation simpler, but I don’t think
> it’s a good idea to include internal messages (INTERNAL_DEPENDENCY,
> INTERNAL_RELATION) in LogicalRepMsgType. LogicalRepMsgType appears to
> represent the external message format used between publisher and
> subscriber, not internal subscriber messages. If we need to introduce
> additional internal messages later (for leader-PA worker communication
> or for other purpose), we would have to extend it again. Instead, it
> would be better either to avoid modifying LogicalRepMsgType for
> subscriber internal messages, or to introduce a broader umbrella
> category like LOGICAL_REP_MSG_INTERNAL_MESSAGE.
>
> Now, coming to Option 1:
> It uses different communication protocols for PA worker and spooled
> messages, which means separate processing would be required for
> sending and reading them. I still believe both should follow the same
> internal communication protocol, either both should include a format
> byte, or neither should. I personally find its implementation harder
> to follow.
>
> Option 2 seems like the better approach, IMo at-least, because it
> introduces an umbrella category (LOGICAL_REP_MSG_INTERNAL_MESSAGE) for
> internal messages. If we need to extend internal messaging in the
> future for other purposes, we wouldn’t have to modify
> LogicalRepMsgType again; instead, we could extend
> WorkerInternalMsgType. This keeps the design cleaner and more
> maintainable.
>
+1 for the Option-2 as it simplifies the handling in patch as compared
to others.
Apart from that,
+/*
+ * Worker internal message types
+ *
+ * This type of messages would be generated by leader apply worker and sent to
+ * the parallel apply worker.
+ */
+typedef enum WorkerInternalMsgType
+{
+ WORKER_INTERNAL_MSG_DEPENDENCY = 'd',
+ WORKER_INTERNAL_MSG_RELATION = 'i',
+} WorkerInternalMsgType;
+
I think here INTERNAL makes the purpose unclear whereas we know that
these will always be used to communicate wth parallel apply workers.
So, how about one of following options:
typedef enum PAWorkerMsgType
{
PA_MSG_XACT_DEPENDENCY = 'd',
PA_MSG_RELMAP = 'i',
} PAWorkerMsgType;
OR
typedef enum ParallelApplyMsgType
{
PARALLEL_APPLY_MSG_XACT_DEPENDENCY = 'd',
PARALLEL_APPLY_MSG_RELMAP = 'i',
} ParallelApplyMsgType;
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2026-04-24 11:23:50 | Re: pgbench: make verbose error messages thread-safe |
| Previous Message | Jim Jones | 2026-04-24 10:41:30 | Re: PoC - psql - emphases line with table name in verbose output |