Quick Links

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Cédric Villemain <cedric(dot)villemain(at)data-bene(dot)io>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach
Date:	2025-07-08 21:26:06
Message-ID:	89c1f26c-977f-44e2-9d78-ddff7c8268b2@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 7/8/25 18:06, Cédric Villemain wrote:
>
>
>
>
>
>
>> On 7/8/25 03:55, Cédric Villemain wrote:
>>> Hi Andres,
>>>
>>>> Hi,
>>>>
>>>> On 2025-07-05 07:09:00 +0000, Cédric Villemain wrote:
>>>>> In my work on more careful PostgreSQL resource management, I've come
>>>>> to the
>>>>> conclusion that we should avoid pushing policy too deeply into the
>>>>> PostgreSQL core itself. Therefore, I'm quite skeptical about
>>>>> integrating
>>>>> NUMA-specific management directly into core PostgreSQL in such a way.
>>>>
>>>> I think it's actually the opposite - whenever we pushed stuff like this
>>>> outside of core it has hurt postgres substantially. Not having
>>>> replication in
>>>> core was a huge mistake. Not having HA management in core is
>>>> probably the
>>>> biggest current adoption hurdle for postgres.
>>>>
>>>> To deal better with NUMA we need to improve memory placement and
>>>> various
>>>> algorithms, in an interrelated way - that's pretty much impossible
>>>> to do
>>>> outside of core.
>>>
>>> Except the backend pinning which is easy to achieve, thus my comment on
>>> the related patch.
>>> I'm not claiming NUMA memory and all should be managed outside of core
>>> (though I didn't read other patches yet).
>>>
>>
>> But an "optimal backend placement" seems to very much depend on where we
>> placed the various pieces of shared memory. Which the external module
>> will have trouble following, I suspect.
>>
>> I still don't have any idea what exactly would the external module do,
>> how would it decide where to place the backend. Can you describe some
>> use case with an example?
>>
>> Assuming we want to actually pin tasks from within Postgres, what I
>> think might work is allowing modules to "advise" on where to place the
>> task. But the decision would still be done by core.
>
> Possibly exactly what you're doing in proc.c when managing allocation of
> process, but not hardcoded in postgresql (patches 02, 05 and 06 are good
> candidates), I didn't get that they require information not available to
> any process executing code from a module.
>

Well, it needs to understand how some other stuff (especially PGPROC
entries) is distributed between nodes. I'm not sure how much of this
internal information we want to expose outside core ...

> Parts of your code where you assign/define policy could be in one or
> more relevant routines of a "numa profile manager", like in an
> initProcessRoutine(), and registered in pmroutine struct:
>
> pmroutine = GetPmRoutineForInitProcess();
> if (pmroutine != NULL &&
> pmroutine->init_process != NULL)
> pmroutine->init_process(MyProc);
>
> This way it's easier to manage alternative policies, and also to be able
> to adjust when hardware and linux kernel changes.
>

I'm not against making this extensible, in some way. But I still
struggle to imagine a reasonable alternative policy, where the external
module gets the same information and ends up with a different decision.

So what would the alternate policy look like? What use case would the
module be supporting?

regards

--
Tomas Vondra

In response to

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach at 2025-07-08 16:06:00 from Cédric Villemain

Responses

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach at 2025-07-09 06:40:00 from Cédric Villemain

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2025-07-08 21:28:19	Re: Non-text mode for pg_dumpall
Previous Message	Hannu Krosing	2025-07-08 21:23:13	Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)