Re: Add recovery to pg_control and remove backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-21 17:17:48
Message-ID: 188e97f4-69d9-4542-b0c1-852fa6b8319b@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/21/23 12:41, Andres Freund wrote:
>
> On 2023-11-21 07:42:42 -0400, David Steele wrote:
>> On 11/20/23 19:58, Andres Freund wrote:
>>> On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:
>>>> On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:
>>>>> Given that, I wonder if what we should do is to just add a new field to
>>>>> pg_control that says "error out if backup_label does not exist", that we set
>>>>> when creating a streaming base backup
>>>>
>>>> That would mean that one still needs to take an extra step to update a
>>>> control file with this byte set, which is something you had a concern
>>>> with in terms of compatibility when it comes to external backup
>>>> solutions because more steps are necessary to take a backup, no?
>>>
>>> I was thinking we'd just set it in the pg_basebackup style path, and we'd
>>> error out if it's set and backup_label is present. But we'd still use
>>> backup_label without the pg_control flag set.
>>>
>>> So it'd just provide a cross-check that backup_label was not removed for
>>> pg_basebackup style backup, but wouldn't do anything for external backups. But
>>> imo the proposal to just us pg_control doesn't actually do anything for
>>> external backups either - which is why I think my proposal would achieve as
>>> much, for a much lower price.
>>
>> I'm not sure why you think the patch under discussion doesn't do anything
>> for external backups. It provides the same benefits to both pg_basebackup
>> and external backups, i.e. they both receive the updated version of
>> pg_control.
>
> Sure. They also receive a backup_label today. If an external solution forgets
> to replace pg_control copied as part of the filesystem copy, they won't get an
> error after the remove of backup_label, just like they don't get one today if
> they don't put backup_label in the data directory. Given that users don't do
> the right thing with backup_label today, why can we rely on them doing the
> right thing with pg_control?

I think reliable backup software does the right thing with backup_label,
but if the user starts getting errors on recovery they the decide to
remove backup_label. I know we can't do much about bad backup software,
but we can at least make this a bit more resistant to user error after
the fact.

It doesn't help that one of our hints suggests removing backup_label. In
highly automated systems, the user might not even know they just
restored from a backup. They are only in the loop because the restore
failed and they are trying to figure out what is going wrong. When they
remove backup_label the cluster comes up just fine. Victory!

This is the scenario I've seen most often -- not the backup/restore
process getting it wrong but the user removing backup_label on their own
initiative. And because it yields such a positive result, at least
initially, they remember in the future that the thing to do is to remove
backup_label whenever they see the error.

If they only have pg_control, then their only choice is to get it right
or run pg_resetwal. Most users have no knowledge of pg_resetwal so it
will take them longer to get there. Also, I think that tool make it
pretty clear that corruption will result and the only thing to do is a
logical dump and restore after using it.

There are plenty of ways a user can mess things up. What I'd like to
prevent is the appearance of everything being OK when in fact they have
corrupted their cluster. That's the situation we have now with
backup_label. Is this new solution perfect? No, but I do think it checks
several boxes, and is a worthwhile improvement.

Regards,
-David

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-11-21 17:18:58 Re: Locks on unlogged tables are locked?!
Previous Message Robert Haas 2023-11-21 17:16:41 Re: Partial aggregates pushdown