The danger of deleting backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: The danger of deleting backup_label
Date: 2023-09-28 21:14:22
Message-ID: 1330cb48-4e47-03ca-f2fb-b144b49514d8@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

While reading through [1] I saw there were two instances where
backup_label was removed to achieve a "successful" restore. This might
work on trivial test restores but is an invitation to (silent) disaster
in a production environment where the checkpoint stored in backup_label
is almost certain to be earlier than the one stored in pg_control.

A while back I had an idea on how to prevent this so I decided to give
it a try. Basically, before writing pg_control to the backup I set
checkpoint to 0xFFFFFFFFFFFFFFFF.

Recovery worked perfectly as long as backup_label was present and failed
hard when it was not:

LOG: invalid primary checkpoint record
PANIC: could not locate a valid checkpoint record

It's not a very good message, but at least the foot gun has been
removed. We could use this as a special value to give a better message,
and maybe use something a bit more unique like 0xFFFFFFFFFADEFADE (or
whatever) as the value.

This is all easy enough for pg_basebackup to do, but will certainly be
non-trivial for most backup software to implement. In [2] we have
discussed perhaps returning pg_control from pg_backup_stop() for the
backup software to save, or it could become part of the backup_label
(encoded as hex or base64, presumably). I prefer the latter as this
means less work for the backup software (except for the need to exclude
pg_control from the backup).

I don't have a patch for this yet because I did not test this idea using
pg_basebackup, but I'll be happy to work up a patch if there is interest.

I feel like we should do *something* here. If even advanced users are
making this mistake, then we should take it pretty seriously.

Regards,
-David

[1]
https://www.postgresql.org/message-id/flat/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8%3D2aw%40mail.gmail.com#746e492bfcd2667635634f1477a61288
[2]
https://www.postgresql.org/message-id/CA%2BhUKGKiZJcfZSA5G5Rm8oC78SNOQ4c8az5Ku%3D4wMTjw1FZ40g%40mail.gmail.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-09-28 21:20:39 Does anyone ever use OPTIMIZER_DEBUG?
Previous Message Tom Lane 2023-09-28 20:46:08 Re: Annoying build warnings from latest Apple toolchain