Re: Comments on Custom RMGRs

From: Greg Lamberson <greg(at)lamco(dot)io>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: lepihov(at)gmail(dot)com, asifrana(at)gmail(dot)com
Subject: Re: Comments on Custom RMGRs
Date: 2026-04-24 14:07:32
Message-ID: 177703965221.23552.11169540665349000505@lamco.io
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrei,

Thanks for reviving this thread and for responding to Heikki's
"design this as an independent hook, separate from rmgrs" point.
Moving away from the rmgr callback was the right call. I read back
through the 2022 to 2024 discussion to get my head around the prior
concerns, and have a number of comments on v1.

1. Regression on phased invocation.

Danil's rmgr_003.v3.patch called the hook at two points,
RMGR_CHECKPOINT_BEFORE_BUFFERS and RMGR_CHECKPOINT_AFTER_BUFFERS, in
response to Andres's 2022 request for "multiple calls to such a
callback, with a parameter where in CheckPointGuts() we are." Jeff
Davis specifically approved that direction:

You've moved the discussion forward: ... 2. The hook is called at
multiple points. Those at least partially address the concerns
raised by Andres and Robert.
[Jeff Davis, 2024-03-29]

v1 reverts to a single call point (after CheckPointBuffers, before
ProcessSyncRequests), with no phase parameter. I think this needs
justification in the commit message, or better, the phased model
should be reinstated. Heikki's 2024 post described a use case where
"we wanted to write a new WAL record at shutdown, and in
CheckPointGuts(), it's already too late for that. It needs to be
done earlier, before starting the shutdown checkpoint." The current
single-point hook cannot serve that use case.

If you believe the single-point model is sufficient, it would help
to enumerate which of the earlier use cases (BDR-style data
structures, unlogged indexes, Stonebraker-style non-smgr storage,
Heikki's pre-shutdown WAL record) are actually servable by v1, and
which are being deliberately left out of scope.

2. Restartpoint behavior is unspecified.

CheckPointGuts() is shared between regular checkpoints and
restartpoints on hot standbys. The patch does not say whether
Checkpoint_hook fires during restartpoints. This matters to
extension authors: a hook that writes application state to disk at
every restartpoint is very different from one that fires only on
primary-initiated checkpoints. Please document the intended
behavior, and if both cases are supposed to fire, consider whether a
CHECKPOINT_IS_RESTARTPOINT flag bit is warranted.

3. Shutdown and end-of-recovery handling.

The example skips work on CHECKPOINT_END_OF_RECOVERY but not on
CHECKPOINT_IS_SHUTDOWN. The right behavior at shutdown is a real
design question (do we flush persistent state one last time, or do
we skip because the next start will reinitialize?). A comment in
xlog.c near the hook invocation describing the contract would help.

4. Hook chaining is broken in the example.

In test_dsm_registry.c:

static Checkpoint_hook_type prev_Checkpoint_hook = NULL;
...
void _PG_init(void)
{
prev_Checkpoint_hook = Checkpoint_hook;
Checkpoint_hook = pgss_Checkpoint;
}

prev_Checkpoint_hook is saved but never invoked inside
pgss_Checkpoint(). Every other chainable hook in PG (emit_log_hook,
planner_hook, ProcessUtility_hook, ExecutorStart_hook, and so on)
invokes the prior hook. Without that, if two extensions both
install Checkpoint_hook, whichever was loaded first gets silently
dropped. This is a one-line fix but the example sets the wrong
precedent for future consumers.

5. Copy-paste leftovers from the earlier pgss example.

#define TDR_DUMP_FILE PGSTAT_STAT_PERMANENT_DIRECTORY \
"/pg_stat_statements.stat"

test_dsm_registry writes to a file literally named
pg_stat_statements.stat. If both test_dsm_registry and the real
pg_stat_statements extension are loaded, the two write to the same
path. Likewise the function name pgss_Checkpoint lives inside
test_dsm_registry.c where it should be tdr_Checkpoint or similar.

6. Preload-phase guard.

The patch allows assignment to Checkpoint_hook at any time. If an
extension is loaded via session_preload_libraries or via a runtime
LOAD, it will mutate that backend's own Checkpoint_hook pointer,
which on fork-based platforms diverges from the postmaster's, and
on EXEC_BACKEND platforms interacts oddly with each child's
re-execution of process_shared_preload_libraries. Consider
rejecting installation after process_shared_preload_libraries_done
is set, similar to how register_sync_handler() in CF 6689 is
structured. Or, if the intent is to allow later installation,
document the per-process semantics clearly.

7. Error level in the callback.

Danil Anisimow noted in 2024 that ereport(ERROR) from an rmgr
checkpoint callback caused the server to fail to start entirely
after end-of-recovery, leaving the admin with no way forward. The
same failure mode exists for Checkpoint_hook. The doc comment
above the hook declaration should advise extension authors to use
LOG level and to return on failure, never ERROR. The example
already does this correctly; the advice just needs to be captured
where future implementers will read it.

8. Documentation.

For a new public hook with this much discussion behind it, a
header comment describing the contract, what is safe to do in the
callback, which phases are supported (one in v1, see item 1),
behavior at shutdown and end-of-recovery and restartpoint, error-
level expectations, and chaining expectations, would go a long way
toward answering the "we should document what is legal to do at
that point" concern Andres raised in 2022. SGML coverage parallel
to custom-rmgr.sgml would be ideal for a feature of this kind, but
at minimum the header declaration needs more than its current one
sentence.

9. Test module placement.

Overloading test_dsm_registry with 165 lines focused on the
checkpoint hook (load_htab, pgss_Checkpoint, file format constants)
blurs what that module was created to demonstrate. Consider a
dedicated src/test/modules/test_checkpoint_hook with its own
Makefile, meson.build, and TAP test, similar to test_custom_rmgrs.
That also gives the example a natural home for a cleaner demo of
the round-trip (insert, checkpoint, crash, start, verify).

10. Minor.

- Naming: Checkpoint_hook is capitalized inconsistently with
emit_log_hook, planner_hook, and ExecutorStart_hook style.
Consider checkpoint_hook for consistency.
- The comment in xlog.c says "Allow a plugin that depends on a
custom RMGR..." but your cover letter explicitly frames the hook
as an alternative to registering an RMGR. The comment should
match the intended use.
- The test assertion "Value inserted after the checkpoint was
lost" demonstrates the correct no-WAL-logging behavior, but a
comment explaining that this is intentional (data between
checkpoints is not crash-safe through this hook alone, extensions
that need stronger guarantees must combine the hook with a
custom RMGR or with WAL-logged records) would help a reader
understand the intended use pattern.

Overall the direction is right and much better than the rmgr-coupled
versions. I think the main things that need to happen for this to
move toward commit are (1) decide on and justify the phased vs
single-point model, with reference to the use cases enumerated
upthread, and (2) write the contract documentation that Andres
asked for three years ago. The implementation details in items
4 through 10 are quick fixes once the design questions are
settled.

Thanks,
Greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mingwei Jia 2026-04-24 14:12:02 [PoC] Umbra: a remap-aware smgr prototype on PostgreSQL master
Previous Message Fujii Masao 2026-04-24 13:08:04 Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks