Re: Add recovery to pg_control and remove backup_label

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-13 04:36:56
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote:
> On 11/10/23 00:37, Michael Paquier wrote:
>> I've done a few more dozen runs, and still nothing. I am wondering
>> what this disturbance was.
> OK, hopefully it was just a blip.

Still nothing on this side. So that seems really like a random blip
in the matrix.

> This has been split out.

Thanks, applied 0001.

>> The term "backup recovery", that we've never used in the tree until
>> now as far as I know. Perhaps this recovery method should just be
>> referred as "recovery from backup"?
> Well, "backup recovery" is less awkward, I think. For instance "backup
> recovery field" vs "recovery from backup field".

Not sure. I've never used this term when referring to recovery from a
backup. Perhaps I'm just not used to it, still that sounds a bit
confusing here.

>> Something in this area is that backupRecoveryRequired is the switch
>> controlling if the fields set by the recovery initialization. Could
>> it be actually useful to leave the other fields as they are and only
>> reset backupRecoveryRequired before the first control file update?
>> This would leave a trace of the backup history directly in the control
>> file.
> Since the other recovery fields are cleared in ReachedEndOfBackup() this
> would be a change from what we do now.
> None of these fields are ever visible (with the exception of
> minRecoveryPoint/TLI) since they are reset when the database becomes
> consistent and before logons are allowed. Viewing them with pg_controldata
> makes sense, but I'm not sure pg_control_recovery() does.
> In fact, are backup_start_lsn, backup_end_lsn, and
> end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe
> I'm missing something?

Yeah, but custom backup/restore tools may want manipulate the contents
of the control file for their own work, so at least for the sake of
visibility it sounds important to me to show all the information at
hand, and that there is no need to.

- The backup label
- file includes the label string you gave to <function>pg_backup_start</function>,
+ The backup history file (which is archived like WAL) includes the label
+ string you gave to <function>pg_backup_start</function>,
as well as the time at which <function>pg_backup_start</function> was run, and
the name of the starting WAL file. In case of confusion it is therefore
- possible to look inside a backup file and determine exactly which
+ possible to look inside a backup history file and determine exactly which

As a side note, it is a bit disappointing that we lose the backup
label from the backup itself, even if the patch changes correctly the
documentation to reflect the new behavior. It is in the backup
history file on the node from where the base backup has been taken or
in the archives, hopefully. However there is nothing that remains in
the base backup itself, and backups can be self-contained (easy with
pg_basebackup --wal-method=stream). I think that we should retain a
minimum amount of information as a replacement for the backup_label,
at least. With this state, the current patch slightly reduces the
debuggability of deployments. That could be annoying for some users.

> New patches attached based on eb81e8e790.

Diving into the code for references about the backup label file, I
have spotted this log in pg_rewind that is now incorrect:
if (showprogress)
pg_log_info("creating backup label and updating control file");

+ printf(_("Backup start location's timeline: %u\n"),
+ ControlFile->backupStartPointTLI);
printf(_("Backup end location: %X/%X\n"),
Perhaps these two should be reversed to match with the header file.

+ /*
+ * After pg_backup_stop() returns this field will contain a copy of
+ * pg_control that should be stored with the backup. Fields have been
+ * updated for recovery and the CRC has been recalculated. The buffer
+ * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always
+ * a consistent size but smaller (and hopefully easier to handle) than
+ * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed.
+ */
+ uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE];

I don't mind the addition of a control file with the max safe size,
because it will never be higher than that. However:

+ /* End the backup before sending pg_control */
+ basebackup_progress_wait_wal_archive(&state);
+ do_pg_backup_stop(backup_state, !opt->nowait);
+ /* Send copy of pg_control containing recovery info */
+ sendFileWithContent(sink, XLOG_CONTROL_FILE,
+ (char *)backup_state->controlFile,

It seems to me that the base backup protocol should always send an 8k
file for the control file so as we maintain consistency with the
on-disk format. Currently, a base backup taken with this patch
results in a control file of size 512B.

+ /* Build the contents of pg_control */
+ pg_control_bytea = (bytea *) palloc(PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+ memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_MAX_SAFE_SIZE);

Similar comment for the control file returned by pg_backup_stop(),
which could just be made a 8k field?

+ <function>pg_backup_stop</function> returns the
+ <filename>pg_control</filename> file, which must be stored in the
+ <filename>global</filename> directory of the backup. It also returns the

And perhaps emphasize that this file should be an 8kB file in the
paragraph mentioning the data returned by pg_backup_stop()?

- Create a <filename>backup_label</filename> file to begin WAL replay at
+ Update <filename>pg_control</filename> file to begin WAL replay at
the checkpoint created at failover and configure the
<filename>pg_control</filename> file with a minimum consistency LSN

pg_control is mentioned twice, so perhaps this could be worded better?

PG_CONTROL_VERSION is important to not forget about.. Perhaps this
should be noted somewhere, or just changed in the patch itself.
Contrary to catalog changes, we do few of these in the control file so
there is close to zero risk of conflicts with other patches in the CF

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-11-13 04:49:39 Re: A recent message added to pg_upgade
Previous Message shveta malik 2023-11-13 04:24:50 Re: Synchronizing slots from primary to standby