Re: injection_points: Switch wait/wakeup to use atomics rather than latches

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: injection_points: Switch wait/wakeup to use atomics rather than latches
Date: 2026-06-15 00:14:31
Message-ID: ai9D5_4mBSdLixOP@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 14, 2026 at 02:23:54PM +0500, Andrey Borodin wrote:
> The same revision also closes an unrelated init race: when two processes
> create the backing file at once, the loser could map it before the winner
> sized it (SIG-something on first touch). The attach path now waits for the
> file to reach full size before mapping.
>
> CI is green. Same two patches on top of your atomics commit. Let me know
> if this prototype goes radically wrong way.

You may find my reply surprising (or not), but using a client tool to
bypass the SQL protocol is super invasive in terms of the in-core
changes, and I'm -1 on that.

Test tooling should rely on simple facilities, and you are
re-implementing quite a few things here that come with a new class of
bugs and what look like design issues to me due to the invasiveness:
- Reimplementation of the registry protocol for file mapping. That
may be useful for other things, and there may be use for a refactored
in-core API, but I don't see why we should use it here:
- Mirroring of internal structs for shmem manipulation.
- Core data updates with zero locking.

I am wondering if we are not overcomplicating things here.. How about
this idea instead of 0002 and 0003, for paths that cannot rely on some
SQL:
- Let's use a file-based markup, located in a injection_points/ in the
data folder.
- When attaching a wait point, write a marker, say wait_$POINT.
- On wakeup, write a wakeup_$POINT.
- The wait routine does periodic checks of the wakeup_$POINT file,
using a stat().

It depends on how much we want to achieve, but forcing a postmaster to
wait at an early startup sequence would then be something like:
- Add the a wait markup.
- At startup, shared_preload_libraries scans the pg_injection_points/
repo, fills in its shmem state by calling InjectionPointAttach().
- postmaster hits the injection point, calls injection_wait()
- Test does what it wants while the postmaster waits, manipulates
states.
- Test script drops a wakeup file.
- Postmaster sees the file, continues its startup sequence.
This means one cannot set an injection point until
shared_preload_libraries is loaded, of course. Cleanup logic feels
kind of nice: test cleans his stuff, or just remove
pg_injection_points/ at shutdown with an exit callback. No platform
specific logic, only FS checks.

This means a bit higher latency, but it eliminates a full class of
bugs due to the fact that it is simpler. This would need to live
alongside patch 0001 so as a _PG_init() can be loaded when the
injection_points lib is loaded; we're still going to need it to fill
the shmem info with the wait slot being occupied.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Tom Lane 2026-06-14 23:33:54 Re: Use \if/\endif to remove non-libxml2 expected output in regression tests