| From: | Greg Lamberson <greg(at)lamco(dot)io> |
|---|---|
| To: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Extensible sync handler registration (register_sync_handler) |
| Date: | 2026-04-10 21:46:44 |
| Message-ID: | IA1PR07MB983072521EE7FDEE98902534A9592@IA1PR07MB9830.namprd07.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hackers,
Motivation:
I am working on Lamstore, an extent based storage backend whose data lives
at byte offsets inside a volume file rather than as md segment files.
Lamstore uses a custom storage manager to route backend I/O, which works
today against stock PostgreSQL and Percona Server for PostgreSQL 18. The
checkpoint fsync pipeline, however, is a dead end: syncsw[] in sync.c is
a static const SyncOps[] indexed by the SyncRequestHandler enum, and there
is no way for an extension to install its own dispatch entry. An extension
cannot reuse SYNC_HANDLER_MD because mdsyncfiletag() resolves paths via
relpathperm() plus OpenTransientFile() plus fsync(), which does not reach
Lamstore storage.
The attached two patch series adds a dynamic register_sync_handler() API
to sync.c, parallel to RegisterCustomRmgr and the smgr_register shape
being developed on CF 5616. Built in handlers (MD, CLOG, commit_ts,
multixact_offset, multixact_member) keep their existing enum indices 0
through 4. Extensions get IDs starting at a new SYNC_HANDLER_FIRST_DYNAMIC.
The patch is strictly additive: zero WAL format changes, zero FileTag
layout changes, zero shared memory changes, zero catalog involvement.
Background:
Thomas Munro's 2019 refactor of the checkpointer fsync queue (commit
3eb77eba5a5, PG12) explicitly anticipated non md consumers. The commit
message reads:
"For now only md.c uses the new interface, but other users are being
proposed. Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one."
Seven years later the framework is in place but the registration
mechanism has never been added. This patch closes that gap.
Two reviewers have already voiced needs that this API makes trivial. In
Tristan Partin's 2024-01-12 message on the CF 5616 thread
(CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK=PCNWEMrA(at)mail(dot)gmail(dot)com),
Heikki Linnakangas wrote:
"[I would like] a debugging tool that checks that we're not missing
any fsyncs. I bumped into a few missing fsync bugs with unlogged
tables lately and a tool like that would've been very helpful."
And Andres Freund wrote:
"I've been thinking that we need a validation layer for fsyncs, it's
too hard to get right without testing. My thought was that we could
keep a shared hash table of all files created / dirtied at the fd
layer, with the filename as key and the value the current LSN. We'd
delete files from it when they're fsynced."
Both needs become straightforward with register_sync_handler(): a
validation extension loads in shared_preload_libraries, registers its
own handler, and observes every sync request flowing through
ProcessSyncRequests() regardless of the underlying storage manager.
This is strictly stronger than the fsync_checker extension in CF 5616
v5+, which wraps md and can only observe md backed data.
What it does now:
syncsw[] is declared as:
static const SyncOps syncsw[] = {
[SYNC_HANDLER_MD] = { mdsyncfiletag, ... },
[SYNC_HANDLER_CLOG] = { clogsyncfiletag, ... },
[SYNC_HANDLER_COMMIT_TS] = { committssyncfiletag, ... },
[SYNC_HANDLER_MULTIXACT_OFFSET] = { multixactoffsetssyncfiletag, ... },
[SYNC_HANDLER_MULTIXACT_MEMBER] = { multixactmemberssyncfiletag, ... },
};
Adding a new handler requires patching core. No extension path exists.
What it will do:
The patch converts syncsw[] from static const SyncOps[] to a heap
allocated static SyncOps * grown via doubling repalloc, bounded by
shared_preload_libraries count. Built in handlers are pre-registered at
their existing enum indices by a new InitSyncHandlers() called from
PostmasterMain before process_shared_preload_libraries(). Extensions
call register_sync_handler() from _PG_init and receive a stable int16
handler ID, which they store in their FileTag.handler field at register
time:
/* src/include/storage/sync.h */
typedef struct SyncOps {
int (*sync_syncfiletag) (const FileTag *ftag, char *path);
int (*sync_unlinkfiletag) (const FileTag *ftag, char *path);
bool (*sync_filetagmatches)(const FileTag *ftag, const FileTag *candidate);
} SyncOps;
extern int16 register_sync_handler(const SyncOps *ops, const char *name);
typedef enum SyncRequestHandler {
SYNC_HANDLER_MD = 0,
SYNC_HANDLER_CLOG = 1,
SYNC_HANDLER_COMMIT_TS = 2,
SYNC_HANDLER_MULTIXACT_OFFSET = 3,
SYNC_HANDLER_MULTIXACT_MEMBER = 4,
SYNC_HANDLER_FIRST_DYNAMIC = 5,
SYNC_HANDLER_MAX = INT16_MAX,
SYNC_HANDLER_NONE = -1,
} SyncRequestHandler;
Dispatch is syncsw[idx].fn() both before and after. Each caller that
dispatches through the table (ProcessSyncRequests, SyncPostCheckpoint,
RememberSyncRequest) caches the base pointer in a local SyncOps *ops
at function entry so the compiler keeps it in a register for the
lifetime of the function, matching the register allocation the
pre-patch static const array received. Verified with objdump on GCC
14.2 at -O2 that the per-dispatch instruction sequence (movswq, mov,
mov, lea, call *(%r14,%rax,8)) is byte identical between stock and
patched builds. The only remaining delta is one additional memory
load at function entry to fetch the syncsw pointer (mov 0x0(%rip),%r14
versus stock's lea 0x0(%rip),%r14): a single L1 cache hit, paid once
per call to ProcessSyncRequests, not per dispatch.
The 0002 patch adds src/test/modules/test_sync_handler with a TAP test
verifying registration, dispatch, HASH_BLOBS coalescing, and cycle_ctr
skip semantics. Test layout mirrors fsync_checker in CF 5616 v5+.
Risk:
Strictly additive. Zero WAL format changes. Zero FileTag layout
changes. Zero shared memory changes. Zero catalog involvement.
Per-dispatch assembly is byte identical to stock (see above). Built
in enum values (SYNC_HANDLER_MD=0 et al) are preserved. ABI safe.
Recovery path unaffected: extensions cannot register during recovery
because InitSyncHandlers() runs in PostmasterMain before any child
process, and the built ins are all pre-registered at fixed IDs. No
behaviour change when no extension registers.
Pre-empting Andres Freund's 2023 objections to CF 4428 v1:
Your four review comments on the earlier smgr_register prototype
(postgrespro.com thread id 2654666, 2023-07-01) apply here by analogy.
Addressing them directly:
1. "Not a good place to initialize, we'll need it in multiple places
that way. How about putting it into BaseInit()?"
Our InitSyncHandlers() is called from a single site, PostmasterMain,
immediately before process_shared_preload_libraries(). Built ins
register via a private sync_handler_register_internal() from that
one site. Child processes inherit the array via fork(), which is a
full POSIX memory barrier. No per-process re-initialization. This
is the same lifecycle pattern as RegisterCustomRmgr.
2. "This adds another level of indirection. I would rather limit the
number of registerable smgrs than do that."
Doubling repalloc is bounded by shared_preload_libraries count
(small). Amortized O(1) cost at preload time. Registration happens
exactly once per extension per postmaster startup, before any
backend exists. Per-dispatch hot path cost is zero by construction:
each caller hoists syncsw into a local SyncOps *ops at function
entry, so the compiler keeps the base pointer in a register and
the per-entry dispatch compiles to byte-identical assembly as the
pre-patch static const array (verified with objdump, see Risk
paragraph above). The only measurable delta is one additional
memory load at function entry, paid once per ProcessSyncRequests
call (not per dispatch). If you still prefer a hard cap I am happy
to add SYNC_HANDLER_MAX_DYNAMIC as a compile-time constant.
3. "Huh, what's that about?" (on pg_compiler_barrier in registration)
Not included. Registration is single threaded during preload. No
barrier needed.
4. "It looked to me like you determined this globally, why do we need
it in every entry then?" (on per entry size fields)
Not applicable. SyncOps is pure function pointers. No per entry
size tracking.
Relationship to CF 5616:
This patch is a narrow focused companion to CF 5616 (Extensible storage
manager API), not a dependency or replacement. CF 5616 makes smgrsw[]
dynamic; this patch does the same for syncsw[]. The two are orthogonal:
none of 5616 v6's six sub patches touch sync.c or sync.h. This patch
applies cleanly against master today without 5616, and introduces none
of 5616's unresolved design questions (hook-vs-registration, catalog
recovery, per tablespace vs per relation config, GUC chaining). If 5616
lands later the two compose naturally: extensions call smgr_register()
and register_sync_handler() back to back in their _PG_init.
Percona carryover:
The patch also applies cleanly to percona/postgres PSP_REL_18_STABLE
for our production deployment. sync.c and sync.h have no Percona
specific changes on that branch (verified via gh api), so the same
patch applies identically modulo mechanical offsets. Upstream master
is the primary submission venue. A companion PR against Percona's
branch will reference this thread.
Verified locally:
* make check-world green on master (PG 19devel) with both patches
applied
* make -C src/test/modules/test_sync_handler check green on both
upstream master and percona/postgres PSP_REL_18_STABLE
I will open a CF 59 (PG20-1) entry today and link this thread's
Message-Id.
Thanks for reading. Feedback, counterproposals, and design pushback
all welcome.
Greg Lamberson
Lamco Development
greg(at)lamco(dot)io
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Make-sync.c-syncsw-extensible-via-register_sync_h.patch | application/octet-stream | 21.1 KB |
| v1-0002-Add-test-module-for-sync-handler-registration.patch | application/octet-stream | 16.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Haibo Yan | 2026-04-10 21:48:32 | Re: Extract numeric filed in JSONB more effectively |
| Previous Message | Michael Paquier | 2026-04-10 20:34:34 | Re: pg17: XX000: no relation entry for relid 0 |