Re: Add recovery to pg_control and remove backup_label

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-06 06:35:28
Message-ID: ZUiJMJc6sRDoGJTJ@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:
> Rebased on 151ffcf6.

I like this patch a lot. Even if the backup_label file is removed, we
still have all the debug information from the backup history file,
thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
is lost. It does a 1:1 replacement of the contents parsed from the
backup_label needed by recovery by fetching them from the control
file. Sounds like a straight-forward change to me.

The patch is failing the recovery test 039_end_of_wal.pl. Could you
look at the failure?

/* Build and save the contents of the backup history file */
- history_file = build_backup_content(state, true);
+ history_file = build_backup_content(state);

build_backup_content() sounds like an incorrect name if it is a
routine onlyused to build the contents of backup history files.

Why is there nothing updated in src/bin/pg_controldata/?

+ /* Clear fields used to initialize recovery */
+ ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+ ControlFile->backupStartPointTLI = 0;
+ ControlFile->backupRecoveryRequired = false;
+ ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that. Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile(). This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.

- basebackup_progress_wait_wal_archive(&state);
- do_pg_backup_stop(backup_state, !opt->nowait);

Why is that moved?

- The backup label
- file includes the label string you gave to <function>pg_backup_start</function>,
- as well as the time at which <function>pg_backup_start</function> was run, and
- the name of the starting WAL file. In case of confusion it is therefore
- possible to look inside a backup file and determine exactly which
- backup session the dump file came from. The tablespace map file includes
+ The tablespace map file includes

It may be worth mentioning that the backup history file holds this
information on the primary's pg_wal, as well.

The changes in sendFileWithContent() may be worth a patch of its own.

--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
@@ -160,14 +163,25 @@ typedef struct ControlFileData
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
+ XLogRecPtr backupCheckPoint;
XLogRecPtr backupStartPoint;
+ TimeLineID backupStartPointTLI;
XLogRecPtr backupEndPoint;
+ bool backupRecoveryRequired;
+ bool backupFromStandby;

This increases the size of the control file from 296B to 312B with an
8-byte alignment, as far as I can see. The size of the control file
has been always a sensitive subject especially with the hard limit of
PG_CONTROL_MAX_SAFE_SIZE. Well, the point of this patch is that this
is the price to pay to prevent users from doing something stupid with
a removal of a backup_label when they should not. Do others have an
opinion about this increase in size?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?

- /*
- * BACKUP METHOD lets us know if this was a typical backup ("streamed",
- * which could mean either pg_basebackup or the pg_backup_start/stop
- * method was used) or if this label came from somewhere else (the only
- * other option today being from pg_rewind). If this was a streamed
- * backup then we know that we need to play through until we get to the
- * end of the WAL which was generated during the backup (at which point we
- * will have reached consistency and backupEndRequired will be reset to be
- * false).
- */
- if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
- {
- if (strcmp(backuptype, "streamed") == 0)
- *backupEndRequired = true;
- }

backupRecoveryRequired in the control file is switched to false for
pg_rewind and true for streamed backups. My gut feeling is telling me
that this should be OK, as out-of-core tools would need an upgrade if
they relied on the backend_label file anyway. I can see that this
change makes use lose some documentation, unfortunately. Shouldn't
these removed lines be moved to pg_control.h instead for the
description of backupEndRequired?

doc/src/sgml/ref/pg_rewind.sgml and
src/backend/access/transam/xlogrecovery.c still include references to
the backup_label file.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-11-06 06:46:16 RE: Is this a problem in GenericXLogFinish()?
Previous Message Ashutosh Bapat 2023-11-06 06:32:12 Re: RFC: Logging plan of the running query