| From: | Greg Lamberson <greg(at)lamco(dot)io> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | sjh910805(at)gmail(dot)com |
| Subject: | Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks |
| Date: | 2026-04-24 01:10:44 |
| Message-ID: | 177699304484.3795.4531051245757369781@lamco.io |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Shin,
Thanks for taking this one on. The XXX comment has been there for a
long time and fixing a known contract violation is worthwhile. The
mechanical part of the patch, moving the conflict check from the
assign hooks into validateRecoveryParameters(), looks right to me.
A few things worth discussing before this gets a committer's eye:
1. Behavioral change when recovery is not requested.
The patch's commit message notes that conflicting recovery target
settings are now silently accepted when ArchiveRecoveryRequested is
false, and argues this is "arguably more correct" because those GUCs
have no effect outside recovery. I am not sure I agree. Today a
misconfigured postgresql.conf (say, both recovery_target_time and
recovery_target_xid present) is caught immediately at postmaster
start. After the patch, that same misconfiguration boots
successfully, and the operator only finds out later when they add
recovery.signal and the startup process FATALs. That is a real
downgrade in error-detection timing.
Is it feasible to keep the early-detection behavior without violating
the assign-hook contract? One option: emit a WARNING (not ERROR)
from the assign hook, or defer the check to a post-config-file pass
(GUC_check_hook chain, or a one-shot check in PostmasterMain before
the startup process is forked). I do not have a strong opinion on
the mechanism, but I think the user-visible behavior (conflicting
recovery_target_* settings are caught at server start) is worth
preserving if possible.
If preserving that is not feasible, I think the commit message should
flag the timing change more prominently, since it is a user-visible
change to when postgres refuses to start.
2. Test coverage of the conflict-in-recovery path.
The existing 003_recovery_targets.pl has a test (not modified by
this patch, as far as I can tell) that exercises the multiple-targets
rejection. With v1, that rejection fires from
validateRecoveryParameters() rather than from an assign hook, so the
error message source changes even if the text is identical. CFBot
is green, so the existing regex must still match, but it would be
good to confirm which test covers this and to verify the test's
assertions actually validate the new FATAL path, not just "server
failed to start for some reason".
3. errdetail wording.
errdetail("At most one of \"recovery_target\", ..., "
"\"recovery_target_xid\" may be set.")
The error-message style guide (doc/src/sgml/error-style-guide.sgml)
says to avoid "may" because it reads as permission rather than
ability. Suggest "can be set" instead. Trivial, but since we are
already touching this message.
4. Test cleanup fragility.
The new test case does:
$node_primary->append_conf('postgresql.conf', "recovery_target_name = ...
recovery_target_time = ...");
$node_primary->start;
...
ALTER SYSTEM RESET recovery_target_name;
ALTER SYSTEM RESET recovery_target_time;
$node_primary->restart;
append_conf permanently extends postgresql.conf. The ALTER SYSTEM
RESET writes to postgresql.auto.conf which takes precedence, so the
stale postgresql.conf lines are effectively masked for subsequent
test steps, but they remain in the file, and any later test that
relies on a predictable postgresql.conf content (for example a test
that inspects ConfigFileVar or does its own append) could be
confused. Using ALTER SYSTEM SET for the setup would be cleaner,
or using a dedicated temporary cluster for this one case. Also,
the ok() predicate `defined $primary_pid && $primary_pid ne ''` is
redundant. safe_psql would have died earlier on start failure.
5. Nit: block comment.
The new block inside validateRecoveryParameters() could benefit from
noting that ArchiveRecoveryRequested is guaranteed true at this
point (because of the early return above it), so a reader does not
have to scroll up to confirm.
Nothing here is a blocker. I think the overall direction is right.
The behavioral change question in item 1 is the main design decision
I would want the author's thinking on.
Thanks,
Greg
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-04-24 01:10:47 | FOR PORTION OF gram.y target_location seems wrong |
| Previous Message | Jeff Davis | 2026-04-24 00:47:11 | Re: sandboxing untrusted code |