Re: Adding basic NUMA awareness

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding basic NUMA awareness
Date: 2026-01-13 01:13:40
Message-ID: 2db78610-b480-4aa0-a1b6-57f1c2dcb708@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/13/26 01:24, Andres Freund wrote:
> Hi,
>
> On 2026-01-12 19:10:00 -0500, Andres Freund wrote:
>> On 2026-01-13 00:58:49 +0100, Tomas Vondra wrote:
>>> On 1/10/26 02:42, Andres Freund wrote:
>>>> psql -Xq -c 'SELECT pg_buffercache_evict_all();' -c 'SELECT numa_node, sum(size) FROM pg_shmem_allocations_numa GROUP BY 1;' && perf stat --per-socket -M memory_bandwidth_read,memory_bandwidth_write -a psql -c 'SELECT sum(abalance) FROM pgbench_accounts;'
>>
>>> And then I initialized pgbench with scale that is much larger than
>>> shared buffers, but fits into RAM. So cached, but definitely > NB/4. And
>>> then I ran
>>>
>>> select * from pgbench_accounts offset 1000000000;
>>>
>>> which does a sequential scan with the circular buffer you mention abobe
>>
>> Did you try it with the query I suggested? One plausible reason why you did
>> not see an effect with your query is that with a huge offset you actually
>> never deform the tuple, which is an important and rather latency sensitive
>> path.
>
> Btw, this doesn't need anywhere close to as much data, it should be visible as
> soon as you're >> L3.
>
> To show why
> SELECT * FROM pgbench_accounts OFFSET 100000000
> doesn't show an effect but
> SELECT sum(abalance) FROM pgbench_accounts;
>
> does, just look at the difference using the perf command I posted. Here on a
> scale 200.
>

OK, I tried with smaller scale (and larger shared buffers, to make the
data set smaller than NBuffers/4).

On the azure VM (scale 200, 32GB sb), there's still no difference:

numactl --membind 0 --cpunodebind 0
297.770 ms

numactl --membind 0 --cpunodebind 1
297.924 ms

and on xeon (scale 100, 8GB sb), there's a bit of a difference:

numactl --membind 0 --cpunodebind 0
236.451 ms

numactl --membind 0 --cpunodebind 1
298.418 ms

So roughly 20%. There's also a bigger difference in the perf, about
5944.3 MB/s vs. 5202.3 MB/s.

>
> Interestingly I do see a performance difference, albeit a smaller one, even
> with OFFSET. I see similar numbers on two different 2 socket machines.
>

I wonder how significant is the number of sockets. The Azure is a single
socket with 2 NUMA nodes, so maybe the latency differences are not
significant enough to affect this kind of tests.

The xeon is a 2-socket machine, but it's also older (~10y).

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-01-13 01:25:57 Re: Adding basic NUMA awareness
Previous Message Andres Freund 2026-01-13 01:08:31 Re: Adding basic NUMA awareness