Re: Adding basic NUMA awareness

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding basic NUMA awareness
Date: 2025-09-18 21:04:45
Message-ID: 659c44a5-f616-492c-ab81-60273d2fe7f6@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/11/25 10:32, Tomas Vondra wrote:
> ...
>
> 8) I've realized some of the TAP tests occasionally fail with
>
> ERROR: no unpinned buffers
>
> and I think I know why. Some of the tests set shared_buffers to a very
> low value - like 1MB or even 128kB, and StrategyGetBuffer() may search
> only a single partition (but not always). We may run out of unpinned
> buffers in that one partition.
>
> This apparently happens more easily on rpi5, due to the weird NUMA
> layout (there are 8 nodes with memory, but getcpu() reports node 0 for
> all cores).
>
> I suspect the correct fix is to ensure StrategyGetBuffer() scans all
> partitions, if there are no unpinned buffers in the current one. On
> realistic setups this shouldn't happen very often, I think.
>
> The other issue I just realized is that StrategyGetBuffer() recalculates
> the partition index over and over, which seems unnecessary (and possibly
> expensive, due to the modulo). And it also does too many loops, because
> it used NBuffers instead of the partition size. I'll fix those later.

Here's a version fixing this issue (in the 0006 part). It modifies
StrategyGetBuffer() to walk through all the partitions, in a round-robin
manner. The way it steps to the next partition is a bit ugly, but it
works and I'll think about some better way.

I haven't done anything about the other issue (the one with huge pages
reserved on NUMA nodes, and SIGBUS).

regards

--
Tomas Vondra

Attachment Content-Type Size
v20250918-0001-NUMA-shared-buffers-partitioning.patch text/x-patch 41.7 KB
v20250918-0002-NUMA-clockweep-partitioning.patch text/x-patch 35.5 KB
v20250918-0003-NUMA-clocksweep-allocation-balancing.patch text/x-patch 25.3 KB
v20250918-0004-NUMA-weighted-clocksweep-balancing.patch text/x-patch 5.1 KB
v20250918-0005-NUMA-partition-PGPROC.patch text/x-patch 48.7 KB
v20250918-0006-fixup-StrategyGetBuffer.patch text/x-patch 6.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-09-18 21:34:40 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Previous Message Nathan Bossart 2025-09-18 20:26:37 Re: GetNamedLWLockTranche crashes on Windows in normal backend