Re: Add recovery to pg_control and remove backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-20 13:50:56
Message-ID: 4b5f4feb-a568-4626-9c11-5c221271cf89@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/19/23 21:15, Michael Paquier wrote:
> (I am not exactly sure how, but we've lost pgsql-hackers on the way
> when you sent v5. Now added back in CC with the two latest patches
> you've proposed attached.)

Ugh, I must have hit reply instead of reply all. It's a rookie error and
you hate to see it.

> Here is a short summary of what has been missed by the lists:
> - I've commented that the patch should not create, not show up in
> fields returned the SQL functions or stream control files with a size
> of 512B, just stick to 8kB. If this is worth changing this should be
> applied consistently across the board including initdb, discussed on
> its own thread.
> - The backup-related fields in the control file are reset at the end
> of recovery. I've suggested to not do that to keep a trace of what
> was happening during recovery. The latest version of the patch resets
> the fields.
> - With the backup_label file gone, we lose some information in the
> backups themselves, which is not good. Instead, you have suggested an
> approach where this data is added to the backup manifest, meaning that
> no information would be lost, particularly useful for self-contained
> backups. The fields planned to be added to the backup manifest are:
> -- The start and end time of the backup, the end timestamp being
> useful to know when stop time can be used for PITR.
> -- The backup label.
> I've agreed that it may be the best thing to do at this end to not
> lose any data related to the removal of the backup_label file.

This looks right to me.

> On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote:
>> On 11/15/23 20:03, Michael Paquier wrote:
>>> As the label is only an informational field, the parsing added to
>>> pg_verifybackup is not really needed because it is used nowhere in the
>>> validation process, so keeping the logic simpler would be the way to
>>> go IMO. This is contrary to the WAL range for example, where start
>>> and end LSNs are used for validation with a pg_waldump command.
>>> Robert, any comments about the addition of the label in the manifest?
>>
>> I'm sure Robert will comment on this when he gets the time, but for now I
>> have backed off on passing the new info to pg_verifybackup and added
>> start/stop time.
>
> FWIW, I'm OK with the bits for the backup manifest as presented. So
> if there are no remarks and/or no objections, I'd like to apply it but
> let give some room to others to comment on that as there's been a gap
> in the emails exchanged on pgsql-hackers. I hope that the summary
> I've posted above covers everything. So let's see about doing
> something around the middle of next week. With Thanksgiving in the
> US, a lot of folks will not have the time to monitor what's happening
> on this thread.

Timing sounds good to me.

>
> + The end time for the backup. This is when the backup was stopped in
> + <productname>PostgreSQL</productname> and represents the earliest time
> + that can be used for time-based Point-In-Time Recovery.
>
> This one is actually a very good point. We'd lost this capacity with
> the backup_label file gone without the end timestamps in the control
> file.

Yeah, the end time is very important for recovery scenarios. We
definitely need that recorded somewhere.

> I've noticed on the other thread the remark about being less
> aggressive with the fields related to recovery in the control file, so
> I assume that this patch should leave the fields be after the end of
> recovery from the start and only rely on backupRecoveryRequired to
> decide if the recovery should use the fields or not:
> https://www.postgresql.org/message-id/241ccde1-1928-4ba2-a0bb-5350f7b191a8@=pgmasters.net
>
> + ControlFile->backupCheckPoint = InvalidXLogRecPtr;
> ControlFile->backupStartPoint = InvalidXLogRecPtr;
> + ControlFile->backupStartPointTLI = 0;
> ControlFile->backupEndPoint = InvalidXLogRecPtr;
> + ControlFile->backupFromStandby = false;
> ControlFile->backupEndRequired = false;
>
> Still, I get the temptation of being consistent with the current style
> on HEAD to reset everything, as well..

I'd rather reset everything for now (as we do now) and think about
keeping these values as a separate patch. It may be that we don't want
to keep all of them, or we need a separate flag to say recovery was
completed. We are accumulating a lot of booleans here, maybe we need a
state var (recoveryRequired, recoveryInProgress, recoveryComplete) and
then define which other vars are valid in each state.

Regards,
-David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2023-11-20 13:56:47 Re: Use of backup_label not noted in log
Previous Message Willi Mann 2023-11-20 13:37:39 [PATCH] fix race condition in libpq (related to ssl connections)