Re: failed NUMA pages inquiry status: Operation not permitted

From: Christoph Berg <myon(at)debian(dot)org>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: failed NUMA pages inquiry status: Operation not permitted
Date: 2025-12-11 12:29:14
Message-ID: aTq5Gt_n-oS_QSpL@msg.df7cb.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Re: Tomas Vondra
> >> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
> >> attached patch. It still calls numa_available(), so that we don't
> >> silently miss future libnuma changes.
> >>
> >> Can you check this makes it work inside the docker container?
> >
> > Yes your patch works. (Sorry I meant to test earlier, but RL...)
>
> Thanks. I've pushed the fix (and backpatched to 18).

It looks like we are not done here yet :(

postgresql-18 is failing here intermittently with this diff:

12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10 21:52:06.000000000 +0000
12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11 11:20:22.618989603 +0000
12:20:24 @@ -6,8 +6,4 @@
12:20:24 -- switch to superuser
12:20:24 \c -
12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
12:20:24 - ok
12:20:24 -----
12:20:24 - t
12:20:24 -(1 row)
12:20:24 -
12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2

That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.

I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
questing/amd64, where libnuma should take care of this itself, without
the extra patch in PG. There was another case on bullseye/amd64 which
has the old libnuma.

It's been frequent enough so it killed 4 out of the 10 builds
currently visible on
https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
(Though to be fair, only one distribution/arch combination was failing
for each of them.)

There is also one instance of it in
https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/

I currently have no idea what's happening.

Christoph

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tomas Vondra 2025-12-11 12:46:54 Re: failed NUMA pages inquiry status: Operation not permitted
Previous Message Heikki Linnakangas 2025-12-11 09:31:35 pgsql: Add runtime checks for bogus multixact offsets

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-12-11 12:43:07 Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions
Previous Message Amit Kapila 2025-12-11 12:26:59 Re: Proposal: Conflict log history table for Logical Replication