Quick Links

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Cédric Villemain <cedric(dot)villemain(at)data-bene(dot)io>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach
Date:	2025-07-10 15:20:50
Message-ID:	20537e21-3a32-4941-91eb-20bdfdb96e26@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 7/9/25 08:40, Cédric Villemain wrote:
>> On 7/8/25 18:06, Cédric Villemain wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 7/8/25 03:55, Cédric Villemain wrote:
>>>>> Hi Andres,
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 2025-07-05 07:09:00 +0000, Cédric Villemain wrote:
>>>>>>> In my work on more careful PostgreSQL resource management, I've come
>>>>>>> to the
>>>>>>> conclusion that we should avoid pushing policy too deeply into the
>>>>>>> PostgreSQL core itself. Therefore, I'm quite skeptical about
>>>>>>> integrating
>>>>>>> NUMA-specific management directly into core PostgreSQL in such a
>>>>>>> way.
>>>>>>
>>>>>> I think it's actually the opposite - whenever we pushed stuff like
>>>>>> this
>>>>>> outside of core it has hurt postgres substantially. Not having
>>>>>> replication in
>>>>>> core was a huge mistake. Not having HA management in core is
>>>>>> probably the
>>>>>> biggest current adoption hurdle for postgres.
>>>>>>
>>>>>> To deal better with NUMA we need to improve memory placement and
>>>>>> various
>>>>>> algorithms, in an interrelated way - that's pretty much impossible
>>>>>> to do
>>>>>> outside of core.
>>>>>
>>>>> Except the backend pinning which is easy to achieve, thus my
>>>>> comment on
>>>>> the related patch.
>>>>> I'm not claiming NUMA memory and all should be managed outside of core
>>>>> (though I didn't read other patches yet).
>>>>>
>>>>
>>>> But an "optimal backend placement" seems to very much depend on
>>>> where we
>>>> placed the various pieces of shared memory. Which the external module
>>>> will have trouble following, I suspect.
>>>>
>>>> I still don't have any idea what exactly would the external module do,
>>>> how would it decide where to place the backend. Can you describe some
>>>> use case with an example?
>>>>
>>>> Assuming we want to actually pin tasks from within Postgres, what I
>>>> think might work is allowing modules to "advise" on where to place the
>>>> task. But the decision would still be done by core.
>>>
>>> Possibly exactly what you're doing in proc.c when managing allocation of
>>> process, but not hardcoded in postgresql (patches 02, 05 and 06 are good
>>> candidates), I didn't get that they require information not available to
>>> any process executing code from a module.
>>>
>>
>> Well, it needs to understand how some other stuff (especially PGPROC
>> entries) is distributed between nodes. I'm not sure how much of this
>> internal information we want to expose outside core ...
>>
>>> Parts of your code where you assign/define policy could be in one or
>>> more relevant routines of a "numa profile manager", like in an
>>> initProcessRoutine(), and registered in pmroutine struct:
>>>
>>> pmroutine = GetPmRoutineForInitProcess();
>>> if (pmroutine != NULL &&
>>> pmroutine->init_process != NULL)
>>> pmroutine->init_process(MyProc);
>>>
>>> This way it's easier to manage alternative policies, and also to be able
>>> to adjust when hardware and linux kernel changes.
>>>
>>
>> I'm not against making this extensible, in some way. But I still
>> struggle to imagine a reasonable alternative policy, where the external
>> module gets the same information and ends up with a different decision.
>>
>> So what would the alternate policy look like? What use case would the
>> module be supporting?
>
>
> That's the whole point: there are very distinct usages of PostgreSQL in
> the field. And maybe not all of them will require the policy defined by
> PostgreSQL core.
>
> May I ask the reverse: what prevent external modules from taking those
> decisions ? There are already a lot of area where external code can take
> over PostgreSQL processing, like Neon is doing.
>

The complexity of making everything extensible in an arbitrary way. To
make it extensible in a useful, we need to have a reasonably clear idea
what aspects need to be extensible, and what's the goal.

> There are some very early processing for memory setup that I can see as
> a current blocker, and here I'd refer a more compliant NUMA api as
> proposed by Jakub so it's possible to arrange based on workload,
> hardware configuration or other matters. Reworking to get distinct
> segment and all as you do is great, and combo of both approach probably
> of great interest. There is also this weighted interleave discussed and
> probably much more to come in this area in Linux.
>
> I think some points raised already about possible distinct policies, I
> am precisely claiming that it is hard to come with one good policy with
> limited setup options, thus requirement to keep that flexible enough
> (hooks, api, 100 GUc ?).
>

I'm sorry, I don't want to sound too negative, but "I want arbitrary
extensibility" is not a very useful feedback. I've asked you to give
some examples of policies that'd customize some of the NUMA stuff.

> There is an EPYC story here also, given the NUMA setup can vary
> depending on BIOS setup, associated NUMA policy must probably take that
> into account (L3 can be either real cache or 4 extra "local" NUMA nodes
> - with highly distinct access cost from a RAM module).
> Does that change how PostgreSQL will place memory and process? Is it
> important or of interest ?
>

So how exactly would the policy handle this? Right now we're entirely
oblivious to L3, or on-CPU caches in general. We don't even consider the
size of L3 when sizing hash tables in a hashjoin etc.

regards

--
Tomas Vondra

In response to

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach at 2025-07-09 06:40:00 from Cédric Villemain

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-07-10 15:31:45	Re: Adding basic NUMA awareness
Previous Message	Amit Langote	2025-07-10 14:54:41	Re: Problem with transition tables on partitioned tables with foreign-table partitions