Re: Add recovery to pg_control and remove backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-21 18:13:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/23 16:37, Andres Freund wrote:
> On 2023-11-20 11:11:13 -0500, Robert Haas wrote:
>> I think we need more votes to make a change this big. I have a
>> concern, which I think I've expressed before, that we keep whacking
>> around the backup APIs, and that has a cost which is potentially
>> larger than the benefits.
> +1. The amount of whacking around in this area has been substantial, and it's
> hard for operators to keep up. And realistically, with data sizes today, the
> pressure to do basebackups with disk snapshots etc is not going to shrink.

True enough, but disk snapshots aren't really backups in themselves, in
most scenarios, because they reside on the same storage as the cluster.
Of course, snapshots can be exported, but that's also expensive.

I see snapshots as an adjunct to backups -- a safe backup offsite
somewhere for DR and snapshots for day to day operations. Even so,
managing snapshots as backups is harder than people think. It is easy to
get wrong and end up with silent corruption.

> Leaving that concern aside, I am still on the fence about this proposal. I
> think it does decrease the chance of getting things wrong in the
> streaming-basebackup case. But for external backups, it seems almost
> universally worse (with the exception of the torn pg_control issue, that we
> also can address otherwise):

Why universally worse? The software stores pg_control instead of backup
label. The changes to pg_basebackup were pretty trivial and the changes
to external backup are pretty much the same, at least in my limited
sample of one.

And I don't believe we have a satisfactory solution to the torn
pg_control issue yet. Certainly it has not been committed and Thomas has
shown enthusiasm for this approach, to the point of hoping it could be
back patched (it can't).

> It doesn't reduce the risk of getting things wrong, you can still omit placing
> a file into the data directory and get silent corruption as a consequence. In
> addition, it's harder to see when looking at a base backup whether the process
> was right or not, because now the good and bad state look the same if you just
> look on the filesystem level!

This is one of the reasons I thought writing just the first 512 bytes of
pg_control would be valuable. It would give an easy indicator that
pg_control came from a backup. Michael was not in favor of conflating
that change with this patch -- but I still think it's a good idea.

> Then there's the issue of making ad-hoc debugging harder by not having a
> human readable file with information anymore, including when looking at the
> history, via backup_label.old.

Yeah, you'd need to use pg_controldata instead. But as Michael has
suggested, we could also write backup_label as backup_info so there is
human-readable information available.

> Given that, I wonder if what we should do is to just add a new field to
> pg_control that says "error out if backup_label does not exist", that we set
> when creating a streaming base backup

I'm not in favor of a change only accessible to pg_basebackup or
external software that can manipulate pg_control.


In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-21 18:16:19 Re: Locks on unlogged tables are locked?!
Previous Message Tom Lane 2023-11-21 18:11:02 Re: PSQL error: total cell count of XXX exceeded