From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Adding basic NUMA awareness |
Date: | 2025-08-09 00:25:40 |
Message-ID: | 34xzlt56mbed5cqphipbozhrmmoapkodnnwbzmeal6y3wjc6ia@3lnvssssgkdx |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
> 2) I'm a bit unsure what "NUMA nodes" actually means. The patch mostly
> assumes each core / piece of RAM is assigned to a particular NUMA node.
There are systems in which some NUMA nodes do *not* contain any CPUs. E.g. if
you attach memory via a CXL/PCIe add-in card, rather than via the CPUs memory
controller. In that case numactl -H (and obviously also the libnuma APIs) will
report that the numa node is not associated with any CPU.
I don't currently have live access to such a system, but this PR piece happens
to have numactl -H output:
https://lenovopress.lenovo.com/lp2184-implementing-cxl-memory-on-linux-on-thinksystem-v4-servers
> numactl -H
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
> node 0 size: 1031904 MB
> node 0 free: 1025554 MB
> node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
> node 1 size: 1032105 MB
> node 1 free: 1024244 MB
> node 2 cpus:
> node 2 size: 262144 MB
> node 2 free: 262143 MB
> node 3 cpus:
> node 3 size: 262144 MB
> node 3 free: 262142 MB
> node distances:
> node 0 1 2 3
> 0: 10 21 14 24
> 1: 21 10 24 14
> 2: 14 24 10 26
> 3: 24 14 26 10
Note that node 2 & 3 don't have associated CPUs (and higher access costs).
I don't think this is common enough to worry about from a performance POV, but
we probably shouldn't crash if we encounter it...
> But it also cares about the cores (and the node for each core), because
> it uses that to pick the right partition for a backend. And here the
> situation is less clear, because the CPUs don't need to be assigned to a
> particular node, even on a NUMA system. Consider the rpi5 NUMA layout:
>
> $ numactl --hardware
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3
> node 0 size: 992 MB
> node 0 free: 274 MB
> node 1 cpus: 0 1 2 3
> node 1 size: 1019 MB
> node 1 free: 327 MB
> ...
> node 0 1 2 3 4 5 6 7
> 0: 10 10 10 10 10 10 10 10
> 1: 10 10 10 10 10 10 10 10
> 2: 10 10 10 10 10 10 10 10
> 3: 10 10 10 10 10 10 10 10
> 4: 10 10 10 10 10 10 10 10
> 5: 10 10 10 10 10 10 10 10
> 6: 10 10 10 10 10 10 10 10
> 7: 10 10 10 10 10 10 10 10
> This says there are 8 NUMA nodes, each with ~1GB of RAM. But the 4 cores
> are not assigned to particular nodes - each core is mapped to all 8 NUMA
> nodes.
FWIW, you can get a different version of this with AMD Epyc too, if "L3 LLC as
NUMA" is enabled.
> I'm not sure what to do about this (or how getcpu() or libnuma handle this).
I don't immediately see any libnuma functions that would care?
I also am somewhat curious about what getcpu() returns for the current node...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-08-09 01:14:08 | Re: Making type Datum be 8 bytes everywhere |
Previous Message | Andres Freund | 2025-08-09 00:05:57 | meson: add and use stamp files for generated headers |