Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, zxwsbg12138(at)gmail(dot)com, david(dot)zhang(at)highgo(dot)ca
Subject: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Date: 2023-10-30 19:47:41
Message-ID: 20231030194741.achmawmgheibz73i@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-10-30 16:08:50 +0900, Michael Paquier wrote:
> From 26a8432fe3ab8426e7797d85d19b0fe69d3384c9 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael(at)paquier(dot)xyz>
> Date: Mon, 30 Oct 2023 16:02:52 +0900
> Subject: [PATCH v4] Require recovery.signal or standby.signal when reading a
> backup_file
>
> Historically, the startup process uses two static variables to control
> if archive recovery should happen, when either recovery.signal or
> standby.signal are defined in the data folder at the beginning of
> recovery:

I think the problem with these variables is that they're a really messy state
machine - something this patch doesn't meaningfully improve IMO.

> This configuration was possible when recovering from a base backup taken
> by pg_basebackup without -R. Note that the documentation requires at
> least to set recovery.signal to restore from a backup, but the startup
> process was not making this policy explicit.

Maybe I just didn't check the right place, but from I saw, this, at most, is
implied, rather than explicitly stated.

> In most cases, one would have been able to complete recovery, but that's a
> matter of luck, really, as it depends on the workload of the origin server.

With -X ... we have all the necessary WAL locally, how does the workload on
the primary matter? If you pass --no-slot, pg_basebackup might fail to fetch
the necessary wal, but then you'd also have gotten an error.

I agree with Robert that this would be a good error check on a green field,
but that I am less convinced it's going to help more than hurt now.

Right now running pg_basebackup with -X stream, without --write-recovery-conf,
gives you a copy of a cluster that will come up correctly as a distinct
instance.

With this change applied, you need to know that the way to avoid the existing
FATAL about restore_command at startup (when recovery.signal exists but
restore_command isn't set)) is to is to set "restore_command = false",
something we don't explain anywhere afaict. We should lessen the need to ever
use restore_command, not increase it.

It also seems risky to have people get used to restore_command = false,
because that effectively disables detection of other timelines etc. But, this
method does force a new timeline - which will be the same on each clone of the
database...

I also just don't think that it's always desirable to create a new timeline.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-10-30 19:49:18 Re: Row pattern recognition
Previous Message Robert Haas 2023-10-30 19:23:28 Re: trying again to get incremental backup