Re: backup manifests

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-26 20:37:47
Message-ID: 16538d02-fd4b-6c4f-81a5-132c8fe8c3e9@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/26/20 11:37 AM, Robert Haas wrote:
>> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost(at)snowman(dot)net> wrot >
> This is where I feel like I'm trying to make decisions in a vacuum. If
> we had a few more people weighing in on the thread on this point, I'd
> be happy to go with whatever the consensus was. If most people think
> having both --no-manifest (suppressing the manifest completely) and
> --manifest-checksums=none (suppressing only the checksums) is useless
> and confusing, then sure, let's rip the latter one out. If most people
> like the flexibility, let's keep it: it's already implemented and
> tested. But I hate to base the decision on what one or two people
> think.

I'm not sure I see a lot of value to being able to build manifest with
no checksums, especially if overhead for the default checksum algorithm
is negligible.

However, I'd still prefer that the default be something more robust and
allow users to tune it down rather than the other way around. But I've
made that pretty clear up-thread and I consider that argument lost at
this point.

>> As for folks who are that close to the edge on their backup timing that
>> they can't have it slow down- chances are pretty darn good that they're
>> not far from ending up needing to find a better solution than
>> pg_basebackup anyway. Or they don't need to generate a manifest (or, I
>> suppose, they could have one but not have checksums..).
>
> 40-50% is a lot more than "if you were on the edge."

For the record I think this is a very misleading number. Sure, if you
are doing your backup to a local SSD on a powerful development laptop it
makes sense.

But backups are generally placed on slower storage, remotely, with
compression. Even without compression the first two are going to bring
this percentage down by a lot.

When you get to page-level incremental backups, which is where this all
started, I'd still recommend using a stronger checksum algorithm to
verify that the file was reconstructed correctly on restore. That much
I believe we have agreed on.

>> Even pg_basebackup (in both fetch and stream modes...) checks that we at
>> least got all the WAL that's needed for the backup from the server
>> before considering the backup to be valid and telling the user that
>> there was a successful backup. With what you're proposing here, we
>> could have someone do a pg_basebackup, get back an ERROR saying the
>> backup wasn't valid, and then run pg_validatebackup and be told that the
>> backup is valid. I don't get how that's sensible.
>
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

I agree with Stephen that this should be done, but I agree with you that
it can wait for a future commit. However, I do think:

1) It should be called out rather plainly in the documentation.
2) If there are files in pg_wal then pg_validatebackup should inform the
user that those files have not been validated.

I know you and Stephen have agreed on a number of doc changes, would it
be possible to get a new patch with those included? I finally have time
to do a review of this tomorrow. I saw some mistakes in the docs in the
current patch but I know those patches are not current.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-03-26 20:38:13 Re: backup manifests
Previous Message Tom Lane 2020-03-26 20:31:06 Re: pgsql: Provide a TLS init hook