Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

From: Cédric Villemain <cedric(dot)villemain(at)data-bene(dot)io>
To: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach
Date: 2025-07-08 01:47:00
Message-ID: 1cd9d7e0-d1ae-4c74-91b5-fbbf8046ee56@data-bene.io
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 7/7/25 16:51, Cédric Villemain wrote:
>>>> * Others might use it to integrate PostgreSQL's own resources (e.g.,
>>>> "areas" of shared buffers) into policies.
>>>>
>>>> Hope this perspective is helpful.
>>>
>>> Can you explain how you want to manage this by an extension defined at
>>> the SQL level, when most of this stuff has to be done when setting up
>>> shared memory, which is waaaay before we have any access to catalogs?
>>
>> I should have said module instead, I didn't follow carefully but at some
>> point there were discussion about shared buffers resized "on-line".
>> Anyway, it was just to give some few examples, maybe this one is to be
>> considered later (I'm focused on cgroup/psi, and precisely reassigning
>> PIDs as needed).
>>
>
> I don't know. I have a hard time imagining what exactly would the
> policies / profiles do exactly to respond to changes in the system
> utilization. And why should that interfere with this patch ...
>
> The main thing patch series aims to implement is partitioning different
> pieces of shared memory (buffers, freelists, ...) to better work for
> NUMA. I don't think there's that many ways to do this, and I doubt it
> makes sense to make this easily customizable from external modules of
> any kind. I can imagine providing some API allowing to isolate the
> instance on selected NUMA nodes, but that's about it.
>
> Yes, there's some relation to the online resizing of shared buffers, in
> which case we need to "refresh" some of the information. But AFAICS it's
> not very extensive (on top of what already needs to happen after the
> resize), and it'd happen within the boundaries of the partitioning
> scheme. There's not that much flexibility.
>
> The last bit (pinning backends to a NUMA node) is experimental, and
> mostly intended for easier evaluation of the earlier parts (e.g. to
> limit the noise when processes get moved to a CPU from a different NUMA
> node, and so on).

The backend pinning can be done by replacing your patch on proc.c to
call an external profile manager doing exactly the same thing maybe ?

Similar to:
pmroutine = GetPmRoutineForInitProcess();
if (pmroutine != NULL &&
pmroutine->init_process != NULL)
pmroutine->init_process(MyProc);

...

pmroutine = GetPmRoutineForInitAuxilliary();
if (pmroutine != NULL &&
pmroutine->init_auxilliary != NULL)
pmroutine->init_auxilliary(MyProc);

Added on some rare places should cover most if not all the requirement
around process placement (process_shared_preload_libraries() is called
earlier in the process creation I believe).

--
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-07-08 01:53:44 Re: Can can I make an injection point wait occur no more than once?
Previous Message Peter Geoghegan 2025-07-08 01:40:20 Re: Can can I make an injection point wait occur no more than once?