Re: The danger of deleting backup_label

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: The danger of deleting backup_label
Date: 2023-10-18 12:39:03
Message-ID: CA+TgmoZJ+WcTsPxzvFbTbV4M7wiyjftL0tyfZOmimeQ0x4V2=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 17, 2023 at 4:17 PM David Steele <david(at)pgmasters(dot)net> wrote:
> Given that the above can't be back patched, I'm thinking we don't need
> backup_label at all going forward. We just write the values we need for
> recovery into pg_control and return *that* from pg_backup_stop() and
> tell the user to store it with their backup. We already have "These
> files are vital to the backup working and must be written byte for byte
> without modification, which may require opening the file in binary
> mode." in the documentation so dealing with pg_control should not be a
> problem. pg_control also has a CRC so we will know if it gets munged.

Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.

On the positive side, you can't remove backup_label in error if
backup_label is not a thing. You certainly can't remove the control
file. You can, however, use the original control file instead of the
one that you were supposed to use. However, that is not really any
different from failing to write the backup_label into the backup
directory, which you can already do today. Also, it adds very little
net complexity to the low-level backup procedure. Instead of needing
to write the backup_label into the backup directory, you write the
control file -- but that's instead, not in addition. So overall it
seems like the complexity is similar to what we have today but one
possible mistake is eliminated.

Also on the positive side, I suspect we could remove a decent amount
of code for dealing with backup_label files. We wouldn't have to read
them any more (and the code to do that is pretty rough-and-ready) and
we wouldn't have to do different things based on whether the
backup_label exists or not. The logic in xlog*.c is extremely
complicated, and everything that we can do to reduce the number of
cases that need to be considered is not just good, but great.

But there are also some negatives.

First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file, (b) be stored someplace
else, or (c) be eliminated as a concept. We're likely to get
complaints about (a), especially if the data in question is anything
big. Any proposal to do (b) risks undermining the whole theory under
which this is a good proposal, namely that removing backup_label gives
us one less thing to worry about. So that brings us to (c).
Personally, I would lose very little sleep if the LABEL field died and
never came back, and I wouldn't miss START TIME and STOP TIME either,
but somebody else might well feel differently. I don't think it's
trivial to get rid of BACKUP METHOD, as there unfortunately seems to
be code that depends on knowing the difference between BACKUP FROM:
streamed and BACKUP FROM: pg_rewind. I suspect that BACKUP FROM:
primary/standby might have the same issue, but I'm not sure. STOP
TIMELINE could be a problem too. I think that if somebody could do
some rejiggering to eliminate some of the differences between the
cases here, that could be really good general cleanup irrespective of
what we decide about this proposal, and moving some things in to
pg_control is probably reasonable too. For instance, it would seem
crazy to me to argue that storing the backup end location in the
control file is OK, but storing the backup end TLI there would not be
OK. But the point here is that there's probably a good deal of careful
thinking that would need to be done here about exactly where all of
the stuff that currently exists in the backup_label file but not in
pg_control needs to end up.

Second, right now, the stuff that we return at the end of a backup is
all text data. With this proposal, it becomes binary data. I entirely
realize that people should only be doing these kinds of backups using
automated tools that that those automated tools should be perfectly
capable of handling binary data without garbling anything. But that's
about as realistic as supposing that people won't instantly remove the
backup_label file the moment it seems like it will solve some problem,
even when the directions clearly state that this should only be done
in some other situation that is not the one the user is facing. It
just occurred to me that one thing we could do to improve the user
experience here is offer some kind of command-line utility to assist
with taking a low-level backup. This could be done even if we don't
proceed with this proposal e.g.

pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMAND

I don't know for sure how much that would help, but I wonder if it
might actually help quite a bit, because right now people do things
like use psql in a shell script to try to juggle a database connection
and then in some other part of the shell script do the data copying.
But it is quite easy to screw up the error handling or the psql
session lifetime or something like that, and this would maybe provide
a nicer interface. Details likely need a good deal of kibitizing.

There might be other problems, too. This is just what occurs to me off
the top of my head. But I think it's an interesting angle to explore
further.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2023-10-18 13:04:13 Re: run pgindent on a regular basis / scripted manner
Previous Message Thomas Munro 2023-10-18 12:06:11 Re: LLVM 16 (opaque pointers)