Re: pgsql: Introduce pg_shmem_allocations_numa view

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Christoph Berg <myon(at)debian(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
Date: 2025-06-24 08:24:53
Message-ID: aFpg1de9ZfS1QgUt@ip-10-97-1-34.eu-west-3.compute.internal
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote:
> On 6/23/25 23:47, Tomas Vondra wrote:
> > ...
> >
> > Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> > calculation.
> >
>
> I think it's likely something like this.

I think the same.

> I noticed that if I modify
> pg_buffercache_numa_pages() to query the addresses one by one, it works.
> And when I increase the number, it stops working somewhere between 16k
> and 17k items.

Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
than 16 pages.

It's also confirmed by test_chunk_size.c attached:

$ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
$ ./test_chunk_size
1 pages: SUCCESS (0 errors)
2 pages: SUCCESS (0 errors)
3 pages: SUCCESS (0 errors)
4 pages: SUCCESS (0 errors)
5 pages: SUCCESS (0 errors)
6 pages: SUCCESS (0 errors)
7 pages: SUCCESS (0 errors)
8 pages: SUCCESS (0 errors)
9 pages: SUCCESS (0 errors)
10 pages: SUCCESS (0 errors)
11 pages: SUCCESS (0 errors)
12 pages: SUCCESS (0 errors)
13 pages: SUCCESS (0 errors)
14 pages: SUCCESS (0 errors)
15 pages: SUCCESS (0 errors)
16 pages: SUCCESS (0 errors)
17 pages: 1 errors
Threshold: 17 pages

No error if -m32 is not used.

> It may be a coincidence, but I suspect it's related to the sizeof(void
> *) being 8 in the kernel, but only 4 in the chroot. So the userspace
> passes an array of 4-byte items, but kernel interprets that as 8-byte
> items. That is, we call
>
> long move_pages(int pid, unsigned long count, void *pages[.count], const
> int nodes[.count], int status[.count], int flags);
>
> Which (I assume) just passes the parameters to kernel. And it'll
> interpret them per kernel pointer size.
>

I also suspect something in this area...

> If this is what's happening, I'm not sure what to do about it ...

We could work by chunks (16?) on 32 bits but would probably produce performance
degradation (we mention it in the doc though). Also would always 16 be a correct
chunk size?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
test_chunk_size.c text/x-csrc 1.6 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tomas Vondra 2025-06-24 09:20:15 Re: pgsql: Introduce pg_shmem_allocations_numa view
Previous Message Fujii Masao 2025-06-24 05:29:28 pgsql: doc: Fix incorrect UUID index entry in function documentation.

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2025-06-24 08:27:28 Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson
Previous Message jian he 2025-06-24 08:06:56 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands