From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Adding basic NUMA awareness |
Date: | 2025-09-18 21:04:45 |
Message-ID: | 659c44a5-f616-492c-ab81-60273d2fe7f6@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/11/25 10:32, Tomas Vondra wrote:
> ...
>
> 8) I've realized some of the TAP tests occasionally fail with
>
> ERROR: no unpinned buffers
>
> and I think I know why. Some of the tests set shared_buffers to a very
> low value - like 1MB or even 128kB, and StrategyGetBuffer() may search
> only a single partition (but not always). We may run out of unpinned
> buffers in that one partition.
>
> This apparently happens more easily on rpi5, due to the weird NUMA
> layout (there are 8 nodes with memory, but getcpu() reports node 0 for
> all cores).
>
> I suspect the correct fix is to ensure StrategyGetBuffer() scans all
> partitions, if there are no unpinned buffers in the current one. On
> realistic setups this shouldn't happen very often, I think.
>
> The other issue I just realized is that StrategyGetBuffer() recalculates
> the partition index over and over, which seems unnecessary (and possibly
> expensive, due to the modulo). And it also does too many loops, because
> it used NBuffers instead of the partition size. I'll fix those later.
Here's a version fixing this issue (in the 0006 part). It modifies
StrategyGetBuffer() to walk through all the partitions, in a round-robin
manner. The way it steps to the next partition is a bit ugly, but it
works and I'll think about some better way.
I haven't done anything about the other issue (the one with huge pages
reserved on NUMA nodes, and SIGBUS).
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
v20250918-0001-NUMA-shared-buffers-partitioning.patch | text/x-patch | 41.7 KB |
v20250918-0002-NUMA-clockweep-partitioning.patch | text/x-patch | 35.5 KB |
v20250918-0003-NUMA-clocksweep-allocation-balancing.patch | text/x-patch | 25.3 KB |
v20250918-0004-NUMA-weighted-clocksweep-balancing.patch | text/x-patch | 5.1 KB |
v20250918-0005-NUMA-partition-PGPROC.patch | text/x-patch | 48.7 KB |
v20250918-0006-fixup-StrategyGetBuffer.patch | text/x-patch | 6.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Matheus Alcantara | 2025-09-18 21:34:40 | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Previous Message | Nathan Bossart | 2025-09-18 20:26:37 | Re: GetNamedLWLockTranche crashes on Windows in normal backend |