Re: The danger of deleting backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: The danger of deleting backup_label
Date: 2023-10-17 20:16:42
Message-ID: 0f948866-7caf-0759-d53c-93c3e266ec3f@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/14/23 11:30, David Steele wrote:
> On 10/12/23 10:19, David Steele wrote:
>> On 10/11/23 18:10, Thomas Munro wrote:
>>
>>> As Stephen mentioned[1], we could perhaps also complain if both backup
>>> label and control file exist, and then hint that the user should
>>> remove the *control file* (not the backup label!).  I had originally
>>> suggested we would just overwrite the control file, but by explicitly
>>> complaining about it we would also bring the matter to tool/script
>>> authors' attention, ie that they shouldn't be backing that file up, or
>>> should be removing it in a later step if they copy everything.  He
>>> also mentions that there doesn't seem to be anything stopping us from
>>> back-patching changes to the backup label contents if we go this way.
>>> I don't have a strong opinion on that and we could leave the question
>>> for later.
>>
>> I'm worried about the possibility of back patching this unless the
>> solution comes out to be simpler than I think and that rarely comes to
>> pass. Surely throwing errors on something that is currently valid
>> (i.e. backup_label and pg_control both present).
>>
>> But perhaps there is a simpler, acceptable solution we could back
>> patch (transparent to all parties except Postgres) and then a more
>> advanced solution we could go forward with.
>>
>> I guess I had better get busy on this.
>
> Attached is a very POC attempt at something along these lines that could
> be back patched. I stopped when I realized excluding pg_control from the
> backup is a necessity to make this work (else we still could end up with
> a torn copy of pg_control even if there is a good copy elsewhere). To
> enumerate the back patch issues as I see them:

Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.

It doesn't really matter where/how they store pg_control as long as it
is written back into PGDATA before the cluster starts. If
backupEndRequired, etc., are set appropriately then recovery will do the
right thing when it starts, just as now if PG crashes after it has
renamed backup_label but before recovery to consistency has completed.

We can still enforce the presence of recovery.signal by checking
backupEndRequired if that's something we want but it seems like
backupEndRequired would be enough. I'm fine either way.

Another thing we can do here is make backup from standby easier. The
user won't need to worry about *when* pg_control is copied. We can just
write the ideal min recovery point into pg_control.

Any informational data currently in backup_label can be returned as
columns (like the start/end lsn is now).

This makes the patch much less invasive and while it definitely,
absolutely cannot be back patched, it seems like a good way forward.

This is the direction I'm planning to work on patch-wise but I'd like to
hear people's thoughts.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-10-17 20:20:58 Re: [PoC/RFC] Multiple passwords, interval expirations
Previous Message Tom Lane 2023-10-17 20:12:52 Re: pg_dump needs SELECT privileges on irrelevant extension table