Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, zxwsbg12138(at)gmail(dot)com, david(dot)zhang(at)highgo(dot)ca
Subject: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Date: 2023-10-31 01:15:21
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 30, 2023 at 12:47:41PM -0700, Andres Freund wrote:
> I think the problem with these variables is that they're a really messy state
> machine - something this patch doesn't meaningfully improve IMO.

Okay. Yes, this is my root issue as well. We're at the stage where
we should reduce the possible set of combinations and assumptions
we're inventing because people can do undocumented stuff, then perhaps
refactor the code on top of that (say, if one combination with too
booleans is not possible, switch to a three-state enum rather than 2
bools, etc).

>> This configuration was possible when recovering from a base backup taken
>> by pg_basebackup without -R. Note that the documentation requires at
>> least to set recovery.signal to restore from a backup, but the startup
>> process was not making this policy explicit.
> Maybe I just didn't check the right place, but from I saw, this, at most, is
> implied, rather than explicitly stated.

See the doc reference here:

So it kind of implies it, still also mentions restore_command. It's
like Schrödinger's cat, yes and no at the same time.

> With -X ... we have all the necessary WAL locally, how does the workload on
> the primary matter? If you pass --no-slot, pg_basebackup might fail to fetch
> the necessary wal, but then you'd also have gotten an error.
> [...]
> Right now running pg_basebackup with -X stream, without --write-recovery-conf,
> gives you a copy of a cluster that will come up correctly as a distinct
> instance.
> [...]
> I also just don't think that it's always desirable to create a new timeline.

Yeah. Another argument I was mentioning to Robert is that we may want
to just treat the case where you have a backup_label without any
signal files just the same as crash recovery, replaying all the local
pg_wal/, and nothing else. For example, something like the attached
should make sure that InArchiveRecovery=true should never be set if
ArchiveRecoveryRequested is not set.

The attached would still cause redo to complain on a "WAL ends before
end of online backup" if not all the WAL is here (reason behind the
tweak of, but the previous tweak to pg_rewind's is not required here).

Attached is the idea I had in mind, in terms of code, FWIW.

Attachment Content-Type Size
0001-Force-crash-recovery-with-backup_label-and-no-.signa.patch text/x-diff 3.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-31 01:33:25 Re: "38.10.10. Shared Memory and LWLocks" may require a clarification
Previous Message Ajin Cherian 2023-10-31 00:51:21 Re: Synchronizing slots from primary to standby