| From: | Ilmar Yunusov <tanswis42(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
| Subject: | Re: File locks for data directory lockfile in the context of Linux namespaces |
| Date: | 2026-06-05 12:37:26 |
| Message-ID: | 178066304625.594057.12185296716401427152.pgcf@coridan.postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi,
I looked at v2, focusing on apply/build status and the PID namespace scenario
described in the cover letter.
I used the v2 patch from Dmitry's 2026-01-17 message, on origin/master at
4cb2a9863d89b320f37eb1bd76822f6f65e59311.
The patch applies cleanly with git am, and git diff --check reports no issues.
I built locally with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install
The build completed successfully. Configure found F_OFD_SETLK on this macOS
host too.
I also built and tested on Linux with the same configure options:
make -s -j3
make -s install
There, configure also found F_OFD_SETLK, and:
make -C src/test/regress check
passed; all regression tests passed.
For the namespace behavior, I used two PostgreSQL builds on the same Linux host:
unpatched master and v2. The test starts the first postmaster in one
pid/ipc/net namespace, then tries to start a second postmaster in another
pid/ipc/net namespace on the same data directory.
On unpatched master, both postmasters started. Both saw themselves as PID 2 in
their own PID namespace, and the second postmaster rewrote postmaster.pid. After
the second startup, the lock file contained the second socket directory:
2
.../pidns-base/data
1780661958
65437
.../pidns-base/sock2
With v2, the first postmaster started, and the second startup failed with:
FATAL: cannot lock the lock file "postmaster.pid"
HINT: Another server is starting.
So the patch fixes the concrete Linux PID namespace failure mode I tested.
One behavior I wanted to ask about: v2 also OFD-locks Unix socket lock files,
not only the data directory lockfile. That follows from calling
FlockDataDirLockFile() from the generic CreateLockFile() path, and I also saw
it at runtime. With one v2 postmaster using a Unix socket directory, /proc/locks
showed OFD write locks for both files:
.../data/postmaster.pid
.../sock/.s.PGSQL.65436.lock
The corresponding postmaster fds were:
/proc/835058/fd/5 -> .../data/postmaster.pid
/proc/835058/fd/8 -> .../sock/.s.PGSQL.65436.lock
I had expected the new lock to apply only to postmaster.pid, because the patch
title, commit message, helper name, and comments all describe the data directory
lockfile. The socket lockfile behavior therefore looked like either an
intentional scope expansion that should be named as such, or an accidental side
effect of using the generic CreateLockFile() path.
Is locking the socket lock file intentional here? If so, maybe the helper name,
comment, and fd tracking should reflect that broader scope. If not, perhaps the
new lock should be applied only for the isDDLock case; otherwise v2 changes
socket lockfile semantics too, not only postmaster.pid.
While reading that code, I also noticed a small error-path issue: DataDirLockFD
starts as 0, but if fcntl(F_OFD_SETLK) fails with a non-EAGAIN error the code
only emits a warning and does not assign a duplicate fd. UnlinkLockFiles()
then closes DataDirLockFD unconditionally. Initializing it to -1, closing it
conditionally, and checking dup(fd) would make that path more explicit.
Aside from those questions, this looks like a useful best-effort improvement to
me, and the namespace failure mode appears to be addressed by the OFD lock in
the Linux test above.
I did not test NFS behavior, older stable branches, Windows behavior, or
installcheck-world. Unprivileged namespace creation was not permitted on the
Linux host I used, so the namespace repro was run with sudo.
Regards,
Ilmar Yunusov
The new status of this patch is: Waiting on Author
| From | Date | Subject | |
|---|---|---|---|
| Next Message | wenhui qiu | 2026-06-05 12:51:59 | Re: [PATCH] vacuumdb: Add --exclude-database option |
| Previous Message | Nazir Bilal Yavuz | 2026-06-05 12:23:52 | Re: Upload only the failed tests logs to the Postgres CI (Cirrus CI) |