From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Organize working memory under per-PlanState context |
Date: | 2025-08-20 05:38:35 |
Message-ID: | 6A4A345C-6960-4F93-A679-F848EF672697@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Aug 20, 2025, at 07:34, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
>
> <v1-0001-Use-per-PlanState-memory-context-for-work-mem.patch>
I understand this is not a final patch, so I would focus on the design:
1. This patch adds ps_WorkMem to PlanState, and make it as parent for other per-node memory contexts, thus all memory usage with that node is grouped and measurable, which is good.
I am thinking that, this change may lead to a chance to free those per-node memory usage earlier once a node finishes. Before this patch, those node execution memory contexts’ parent/grandparent is portal context, memories will only be released once a query is done. Now, with this change, maybe we can free per-node memory earlier, which may benefit large complex query executions.
I know some memory must be retained until the entire query finishes. But those per-node memories, such as hash table, might be destroyed immediately after a node finishes.
2. For the new function ValidateWorkMemParent(), the logic is that, if a parent is given, then the parent must be a child of workmem; otherwise use the node’s ps_WorkMem as parent. Can you add a comment to explain why design like this? And when a parent should be explicitly specified?
3. The new function WorkMemClamBlockSize() will change the existing logic, because it always use Min(maxBlockSize, size). But nodeAgg.c, the old logic is:
/*
* Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
* to avoid large jumps in memory usage.
*/
maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
aggstate->hash_tablecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
"HashAgg table context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
maxBlockSize);
The new function doesn’t cover the logic of “no smaller than ALLOCSET_DEFAULT_INTSIZE. Is logic change intended?
4. You add MemoryContextSwitchTo(GetWorkMem(ps)) wrapper around all tuplesort_begin_heap():
oldcontext = MemoryContextSwitchTo(GetWorkMem(&aggstate->ss.ps));
aggstate->sort_out = tuplesort_begin_heap(tupDesc,
sortnode->numCols,
sortnode->sortColIdx,
sortnode->sortOperators,
sortnode->collations,
sortnode->nullsFirst,
work_mem,
NULL, TUPLESORT_NONE);
MemoryContextSwitchTo(oldcontext);
Should we just pass in a memory context into tuplesort_begin_heap()? Also, for the parameter “worker_mem”, should we change it to GetWorkMemLimit()?
5. For the new function CheckWorkMemLimit():
bool
CheckWorkMemLimit(PlanState *ps)
{
size_t allocated = MemoryContextMemAllocated(GetWorkMem(ps), true);
return allocated <= ps->ps_WorkMemLimit;
}
It calls GetWorkMem(), and GetWorkMem() will create work men if it is null. Meaning that, once a caller calls CheckWorkMemLimit() against a node, if the node has no work mem yet, then it will automatically create workmem for the node. Will that lead to unneeded workmem creation? If workmem is null, that means workmem is not used, thus not exceeding limit.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-20 06:05:17 | Re: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Previous Message | Ajin Cherian | 2025-08-20 05:24:23 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |