| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Adding basic NUMA awareness |
| Date: | 2026-01-13 14:14:17 |
| Message-ID: | clx4zzd7kau4vvh5ynu5ssxg3jqfqzurgcbtotytzgzkhb3nis@qfl5xwv44yad |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-01-13 02:13:40 +0100, Tomas Vondra wrote:
> On the azure VM (scale 200, 32GB sb), there's still no difference:
One possibility is that the host is configured with memory interleaving. That
configures the memory so that physical memory addresses interleave between the
different NUMA nodes, instead of really being node local. That can help avoid
bad performance characteristics for NUMA naive applications.
I don't quite know how to figure that out though, particularly from within a
VM :(. Even something like https://github.com/nviennot/core-to-core-latency
or intel's mlc will not necessarily be helpful, because it depends on which
node the measured cacheline ends up on.
But I'd probably still test it, just to see whether you're observing very
different latencies between the systems.
> > Interestingly I do see a performance difference, albeit a smaller one, even
> > with OFFSET. I see similar numbers on two different 2 socket machines.
> >
>
> I wonder how significant is the number of sockets. The Azure is a single
> socket with 2 NUMA nodes, so maybe the latency differences are not
> significant enough to affect this kind of tests.
Ah, yes, a single socket machine might not show that much of an increase, at
least in simpler cases. One of my workstations has two sockets, but each
socket has two numa nodes, the latency difference between the same numa node
and the other numa node in the same socket is small, but the difference to the
other socket is ~1.5x.
Using intel's mlc:
Measuring idle latencies for sequential access (in ns)...
Numa node
Numa node 0 1 2 3
0 98.6 106.9 157.6 167.9
1 105.8 99.4 158.4 170.5
2 157.2 167.4 103.6 105.6
3 158.4 171.2 104.5 104.3
So there's a about a 2-10ns latency difference between 0,1 and 2,3, but about
a 50-60ns diffence across sockets...
> The xeon is a 2-socket machine, but it's also older (~10y).
It's perhaps worth noting that memory access latency has been *in*creasing in
the last generation or two of hardware...
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-01-13 14:16:22 | how to gate experimental features (SQL/PGQ) |
| Previous Message | David Rowley | 2026-01-13 13:40:59 | Re: [PATCH} Move instrumentation structs |