Re: pgsql: Introduce pg_shmem_allocations_numa view

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Christoph Berg <myon(at)debian(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
Date: 2025-06-23 20:10:46
Message-ID: 6c9f9f7e-947b-4fc3-bdb6-b0696d7492e5@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 6/23/25 21:57, Christoph Berg wrote:
> Re: Andres Freund
>> How confident are we that this isn't actually because we passed a bogus
>> address to the kernel or such? With this patch, are *any* pages recognized as
>> valid on the machines that triggered the error?
>
> See upthread - the first 35 pages were ok, then a lot of -14.
>
>> I wonder if we ought to report the failures as a separate "numa node"
>> (e.g. NULL as node id) instead ...
>
> Did that now, using N+1 (== 1 here) for errors in this Debian i386
> environment (chroot on an amd64 host):
>
> select * from pg_shmem_allocations_numa \crosstabview
>
> name │ 0 │ 1
> ────────────────────────────────────────────────┼──────────┼──────────
> multixact_offset │ 69632 │ 65536
> subtransaction │ 139264 │ 131072
> notify │ 139264 │ 0
> Shared Memory Stats │ 188416 │ 131072
> serializable │ 188416 │ 86016
> PROCLOCK hash │ 4096 │ 0
> FinishedSerializableTransactions │ 4096 │ 0
> XLOG Ctl │ 2117632 │ 2097152
> Shared MultiXact State │ 4096 │ 0
> Proc Header │ 4096 │ 0
> Archiver Data │ 4096 │ 0
> .... more 0s in the last column ...
> AioHandleData │ 1429504 │ 0
> Buffer Blocks │ 67117056 │ 67108864
> Buffer IO Condition Variables │ 266240 │ 0
> Proc Array │ 4096 │ 0
> .... more 0s
> (73 rows)
>
>
> There is something fishy with pg_buffercache. If I restart PG, I'm
> getting "Bad address" (errno 14), this time as return value of
> move_pages().
>
> postgres =# select * from pg_buffercache_numa;
> DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:383
> 2025-06-23 19:41:41.315 UTC [1331894] ERROR: failed NUMA pages inquiry: Bad address
> 2025-06-23 19:41:41.315 UTC [1331894] STATEMENT: select * from pg_buffercache_numa;
> ERROR: XX000: failed NUMA pages inquiry: Bad address
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394
>
> Repeated calls are fine.
>

Huh. So it's only the first call that does this?

Can you maybe print the addresses passed to pg_numa_query_pages? I
wonder if there's some bug in how we fill that array. Not sure why would
it happen only on 32-bit systems, though.

I'll create a 32-bit VM so that I can try reproducing this.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Christoph Berg 2025-06-23 20:31:50 Re: pgsql: Introduce pg_shmem_allocations_numa view
Previous Message Christoph Berg 2025-06-23 19:57:56 Re: pgsql: Introduce pg_shmem_allocations_numa view

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-06-23 20:25:16 eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Christoph Berg 2025-06-23 19:57:56 Re: pgsql: Introduce pg_shmem_allocations_numa view