Re: Organize working memory under per-PlanState context

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Organize working memory under per-PlanState context
Date: 2025-08-20 05:38:35
Message-ID: 6A4A345C-6960-4F93-A679-F848EF672697@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 20, 2025, at 07:34, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
>
> <v1-0001-Use-per-PlanState-memory-context-for-work-mem.patch>

I understand this is not a final patch, so I would focus on the design:

1. This patch adds ps_WorkMem to PlanState, and make it as parent for other per-node memory contexts, thus all memory usage with that node is grouped and measurable, which is good.

I am thinking that, this change may lead to a chance to free those per-node memory usage earlier once a node finishes. Before this patch, those node execution memory contexts’ parent/grandparent is portal context, memories will only be released once a query is done. Now, with this change, maybe we can free per-node memory earlier, which may benefit large complex query executions.

I know some memory must be retained until the entire query finishes. But those per-node memories, such as hash table, might be destroyed immediately after a node finishes.

2. For the new function ValidateWorkMemParent(), the logic is that, if a parent is given, then the parent must be a child of workmem; otherwise use the node’s ps_WorkMem as parent. Can you add a comment to explain why design like this? And when a parent should be explicitly specified?

3. The new function WorkMemClamBlockSize() will change the existing logic, because it always use Min(maxBlockSize, size). But nodeAgg.c, the old logic is:

/*
* Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
* to avoid large jumps in memory usage.
*/
maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);

/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);

/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);

aggstate->hash_tablecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
"HashAgg table context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
maxBlockSize);

The new function doesn’t cover the logic of “no smaller than ALLOCSET_DEFAULT_INTSIZE. Is logic change intended?

4. You add MemoryContextSwitchTo(GetWorkMem(ps)) wrapper around all tuplesort_begin_heap():

oldcontext = MemoryContextSwitchTo(GetWorkMem(&aggstate->ss.ps));

aggstate->sort_out = tuplesort_begin_heap(tupDesc,
sortnode->numCols,
sortnode->sortColIdx,
sortnode->sortOperators,
sortnode->collations,
sortnode->nullsFirst,
work_mem,
NULL, TUPLESORT_NONE);

MemoryContextSwitchTo(oldcontext);

Should we just pass in a memory context into tuplesort_begin_heap()? Also, for the parameter “worker_mem”, should we change it to GetWorkMemLimit()?

5. For the new function CheckWorkMemLimit():

bool
CheckWorkMemLimit(PlanState *ps)
{
size_t allocated = MemoryContextMemAllocated(GetWorkMem(ps), true);
return allocated <= ps->ps_WorkMemLimit;
}

It calls GetWorkMem(), and GetWorkMem() will create work men if it is null. Meaning that, once a caller calls CheckWorkMemLimit() against a node, if the node has no work mem yet, then it will automatically create workmem for the node. Will that lead to unneeded workmem creation? If workmem is null, that means workmem is not used, thus not exceeding limit.

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-08-20 06:05:17 Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
Previous Message Ajin Cherian 2025-08-20 05:24:23 Re: Improve pg_sync_replication_slots() to wait for primary to advance