From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Wolfgang Walther <walther(at)technowledgy(dot)de> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: PostgreSQL fails to start inside Nix' darwin sandbox |
Date: | 2025-09-10 22:45:50 |
Message-ID: | CA+hUKGJoOCA5PmhqU5P8gWe2Am=O-f0ubDjUed2kPuLRWe-UEg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Sep 9, 2025 at 9:33 PM Wolfgang Walther <walther(at)technowledgy(dot)de> wrote:
> Nix on Linux runs all builds and tests in a sandboxed environment by
> default. Nix on Darwin/macOS doesn't enable the sandbox by default,
> because macOS' native sandboxing capabilities are limited. We'd like to
> enable the sandbox by default in the future. Currently, this prevents
> running PostgreSQL's test suite or building extensions with cargo-pgrx,
> both of which require *running* initdb / postgres inside the sandbox.
>
> This is because the Darwin sandbox doesn't allow creating System V
> shared memory segments. Nix' Linux sandbox is able to create IPC
> namespaces, which allows creating these safely. To our knowledge it's
> not possible to create these namespaces with the native darwin
> sandboxing capabilities. Enabling IPC regardless would allow
> communicating with other sandboxes and the host system, defeating the
> point of the sandbox.
>
> System V shared memory segments are used by PostgreSQL to provide a lock
> on the data directory, as explained in sysv_shmem.c. The comment also
> mentions the possibility to introduce a compile and/or run-time test
> here. For our use-case, a run-time test seems much better, because we'd
> want the same binaries to not do this inside the sandbox, but work as
> before when actually run on the host.
>
> Right now, initdb fails with his error:
>
> FATAL: could not create shared memory segment: Operation not permitted
> DETAIL: Failed system call was shmget(key=80109247, size=56, 03600).
>
> It would be great if this was fixed to allow running PostgreSQL in this
> environment.
I've run into variations of this problem in a couple of other contexts:
* older FreeBSD jails (a kind of sandbox) had approximately the same
problem, but modern versions support per-jail System V namespaces so
that problem went away
* Capsicum[1] rejects all system-wide namespace-like concepts
* Android (Linux) disables System V IPC, but (perhaps interestingly)
termux builds PostgreSQL against libandroid-shmem[2][3] to emulate the
System V shmem stuff
I haven't looked into how libandroid-shmem works and whether it really
provides the interlocking semantics we want or whether you might be
able to port it easily. It appears to call into other central Android
libraries, so I doubt it, but you might get some ideas.
The next problem will be System V semaphores. I posted a patch[4]
that uses macOS futexes to implement semaphores (pretty much the same
way libc does on some other systems), which would fix that version of
the problem. But you could presumably already use the more wasteful
named POSIX semaphores.
For the tiny "interlocking" memory segment, which we use on all Unixen
without an alternative, I agree that it would be nice to get rid of
it. Off the cuff ideas: Perhaps the postmaster could exclusively
lock a file at startup but only briefly, and then backends could
individually share-lock it at startup, and then I guess defend against
a race where the postmaster exits concurrently, if you believe that
works correctly on all file systems. Or perhaps the postmaster could
bind to a dummy AF_UNIX socket under pgdata at startup, since no
process can bind to that address again until all children that
inherited the socket have exited I think?, and perhaps that socket
could eventually be merged with ideas floating around about a general
control socket that can solve a few other problems we have. Or
perhaps there is a way to use a named pipe, since you can tell if
anyone still holds the other end of it (read vs write), and perhaps
that could be merged with the existing postmaster death pipe so we
don't need a new descriptor.
[1] https://www.cl.cam.ac.uk/research/security/capsicum/
[2] https://github.com/termux/termux-packages/blob/master/packages/postgresql/src-backend-Makefile.patch
[3] https://github.com/termux/libandroid-shmem
[4] https://www.postgresql.org/message-id/flat/CA%2BhUKGKRQrJhVYBkmLJZsScJ434qiduWzzpB0-0_FW8z1kTjcw%40mail.gmail.com#19f7d84d058a908865bafbf82233a07f
From | Date | Subject | |
---|---|---|---|
Next Message | Tender Wang | 2025-09-11 01:40:27 | Re: BUG #19046: Incorrect result when using json_array() with column reference in subquery combined with RIGHT JOIN |
Previous Message | Tom Lane | 2025-09-10 20:18:05 | Re: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset |