| From: | "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
|---|---|
| To: | "Alexander Korotkov" <aekorotkov(at)gmail(dot)com> |
| Cc: | "Alexander Pyhalov" <a(dot)pyhalov(at)postgrespro(dot)ru>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Asynchronous MergeAppend |
| Date: | 2026-04-06 23:48:05 |
| Message-ID: | DHMH23M7UOFS.12W6PUDI1I3NH@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat Apr 4, 2026 at 11:24 PM -03, Alexander Korotkov wrote:
> I made more work on the patchset.
>
> Patch #1 now considers IncrementalSort as exclusion alongside with
> Sort. Exclusion check is now on the top of the switch().
> Patch #2 is split into 3 patches: common structures, common sync
> append logic, and common async append logic.
> New structs are now named AppendBase/AppendBaseState, corresponding
> fields are "ab" and "as".
>
> Most importantly I noted that this patchset actually only makes
> initial heap filling asynchronous. The steady work after that is
> still syncnronous. Even that it used async infrastructure, it fetched
> tuples from children subplans one-by-one: effectively synchronous but
> paying for asynchronous infrastructure. I think even with this
> limitation, this patchset is valuable: the startup cost for children
> foreignscans can be high. But this understanding allowed me to
> significantly simplify the main patch including:
> 1) After initial heap filling, use ExecProcNode() to fetch from children plans.
> 2) Remove ms_has_asyncresults entirely. Async responses store directly
> into ms_slots[] (the existing heap slot array), which serves as both
> the merge state and the "result arrived" indicator via TupIsNull().
> 3) Removed needrequest usage from MergeAppend. Since MergeAppend only
> fires initial requests (via ExecAppendBaseAsyncBegin()) and never
> sends follow-up requests, needrequest tracking is unnecessary.
> ExecMergeAppendAsyncRequest() was eliminated entirely.
> 4) ExecMergeAppendAsyncGetNext() reduced to a simple wait loop:
> 5) asyncresults allocation reduced back to nasyncplans. MergeAppend
> doesn't use it (stores in ms_slots), and Append only needs nasyncplans
> entries for its stack.
>
> Additionally, I made the following changes.
> 1) WAIT_EVENT_MERGE_APPEND_READY wait event instead of extending
> WAIT_EVENT_APPEND_READY. That should be less confusing for monitoring
> purposes.
> 2) More tests: error handling with broken partition, plan-time
> partition pruning, and run-time partition pruning tests for async
> MergeAppend.
>
Thanks for v17, the split of 0002 into multiple patches seems much
better. Overall I agree with the changes on v17 compared with v16, the
removal of ms_has_asyncresults makes the code better to read and follow.
The separate WAIT_EVENT_MERGE_APPEND_READY for better monitoring is also
good, I've tested some long running queries and I've find the event on
pg_stat_activity. The steady work changes also looks good.
One minor issue on 0002:
+ bool valid_subplans_identified; /* is valid_asyncplans valid? */
+ Bitmapset *valid_subplans;
+ Bitmapset *valid_asyncplans; /* valid asynchronous plans indexes */
Should be /* is valid_subplans valid? */
-----
Minor comment on 0005:
+static void
+ExecMergeAppendAsyncGetNext(MergeAppendState *node, int mplan)
+{
+ /*
+ * All initial async requests were fired by ExecAppendBaseAsyncBegin.
Wondering if we should reference ExecMergeAppendAsyncBegin() instead of
ExecAppendBaseAsyncBegin() since this is on nodeMergeAppend, what do you
think?
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2026-04-06 23:53:02 | Re: Adding REPACK [concurrently] |
| Previous Message | Zsolt Parragi | 2026-04-06 23:36:27 | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |