Re: Asynchronous MergeAppend

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2026-04-05 02:24:48
Message-ID: CAPpHfdsO8zYpDW==D6T5N0cJ+AzK7a_OyXJoYU1kFi=xZFTLuQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Mon, Mar 30, 2026 at 3:25 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
> On 29/03/26 22:20, Alexander Korotkov wrote:
> > Thank you for your work on this subject.
> > I have revised the patchset. I think it would be better if common
> > infrastructure goes first. Otherwise we commit async merge append and
> > immediately revise it. I also did some minor improvements.
> >
>
> I was thinking about this but did not managed to spent time on it.
> Thanks for re-organizing the patches, it looks better and I think that
> it make more sense on this order.
>
> I also agree with the minor improvements.

I made more work on the patchset.

Patch #1 now considers IncrementalSort as exclusion alongside with
Sort. Exclusion check is now on the top of the switch().
Patch #2 is split into 3 patches: common structures, common sync
append logic, and common async append logic.
New structs are now named AppendBase/AppendBaseState, corresponding
fields are "ab" and "as".

Most importantly I noted that this patchset actually only makes
initial heap filling asynchronous. The steady work after that is
still syncnronous. Even that it used async infrastructure, it fetched
tuples from children subplans one-by-one: effectively synchronous but
paying for asynchronous infrastructure. I think even with this
limitation, this patchset is valuable: the startup cost for children
foreignscans can be high. But this understanding allowed me to
significantly simplify the main patch including:
1) After initial heap filling, use ExecProcNode() to fetch from children plans.
2) Remove ms_has_asyncresults entirely. Async responses store directly
into ms_slots[] (the existing heap slot array), which serves as both
the merge state and the "result arrived" indicator via TupIsNull().
3) Removed needrequest usage from MergeAppend. Since MergeAppend only
fires initial requests (via ExecAppendBaseAsyncBegin()) and never
sends follow-up requests, needrequest tracking is unnecessary.
ExecMergeAppendAsyncRequest() was eliminated entirely.
4) ExecMergeAppendAsyncGetNext() reduced to a simple wait loop:
5) asyncresults allocation reduced back to nasyncplans. MergeAppend
doesn't use it (stores in ms_slots), and Append only needs nasyncplans
entries for its stack.

Additionally, I made the following changes.
1) WAIT_EVENT_MERGE_APPEND_READY wait event instead of extending
WAIT_EVENT_APPEND_READY. That should be less confusing for monitoring
purposes.
2) More tests: error handling with broken partition, plan-time
partition pruning, and run-time partition pruning tests for async
MergeAppend.

I'm going to went through this patchset another time tomorrow and push
it on Monday if there are no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v17-0005-MergeAppend-should-support-Async-Foreign-Scan-su.patch application/octet-stream 44.3 KB
v17-0003-Extract-common-Append-MergeAppend-executor-logic.patch application/octet-stream 23.9 KB
v17-0004-Move-async-infrastructure-into-shared-AppendBase.patch application/octet-stream 19.2 KB
v17-0001-mark_async_capable-subpath-should-match-subplan.patch application/octet-stream 3.0 KB
v17-0002-Introduce-AppendBase-AppendBaseState-base-types-.patch application/octet-stream 64.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-04-05 02:28:59 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Fujii Masao 2026-04-05 02:24:08 Re: Exit walsender before confirming remote flush in logical replication