`pg_ctl init` crashes when run concurrently; semget(2) suspected

From: Gavin Panella <gavinpanella(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Date: 2025-08-10 20:37:50
Message-ID: CALL7chmzY3eXHA7zHnODUVGZLSvK3wYCSP0RmcDFHJY8f28Q3g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Summary: semget(2) behaves differently on macOS and requires extra care.

I have many tests which spin up clusters using `pg_ctl init`, each in its
own single-use temporary directory. Each test is run for every PostgreSQL
installation found on the host machine. These tests are often run
concurrently. Since adding PostgreSQL 17 to the mix, I've been getting
sporadic failures on macOS:

FATAL: could not create semaphores: Invalid argument
DETAIL: Failed system call was semget(176163502, 20, 03600).
child process exited with exit code 1

I think it's related to the increase of SEMAS_PER_SET
in 38da053463bef32adf563ddee5277d16d2b6c5a (later reverted
in 810a8b1c8051d4e8822967a96f133692698386de) combined with the behaviour of
semget(2) on macOS.

I think the bug manifests because:

- I create two clusters concurrently using `pg_ctl init`. One cluster is
PostgreSQL 17; the other is PostgreSQL 16 or earlier.
- Their data directories are separate but created close enough in time
to have sequential inodes. This is relevant because the inode is used to
seed the semaphore key.
- Somehow (waves hands) semget(2) in PostgreSQL 17 is called with a key
that points at a preexisting semaphore set. On Linux, due to the IPC_CREAT
| IPC_EXCL flags, this returns <0 and sets errno to EEXIST. On macOS, it
sets it instead to EINVAL, likely because the requested number of
semaphores is greater than those in the existing set. This is in the
InternalIpcSemaphoreCreate function, which then aborts the process.

The attached patch fixes the issue, I think, and has another description of
this mechanism. On EINVAL it adds an additional call to semget(2) but for
zero semaphores.

The patch is relative to master, but I developed it against REL_17_5; it
should apply cleanly to both. I think it would be good to backport a fix to
17 too.

If anyone is feeling nerd-sniped, some better proof of the "waves hands"
bit would be useful, because that was a working hypothesis that led to a
working fix, and I have not yet had time to investigate further.

Please consider the patch for review. Thanks!

Gavin.

Attachment Content-Type Size
v1-0001-When-getting-EINVAL-from-semget-2-probe-for-EEXIS.patch application/octet-stream 2.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-08-10 21:23:00 Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Previous Message Kirill Reshke 2025-08-10 19:20:09 Re: Test instability when pg_dump orders by OID