Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-26 15:37:48
Message-ID: CA+TgmoZcEXq0SUveFRbqrso1EsDKCn8YXPSyWJBhC-c3fDbO=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > That's a fair argument, but I think the other relevant principle is
> > that we try to give people useful defaults for things. I think that
> > checksums are a sufficiently useful thing that having the default be
> > not to do it doesn't make sense. I had the impression that you and
> > David were in agreement on that point, actually.
>
> I agree with wanting to have useful defaults and that checksums should
> be included by default, and I'm alright even with letting people pick
> what algorithms they'd like to have too. The construct here is made odd
> because we've got this idea that "no checksum" is an option, which is
> actually something that I don't particularly like, but that's what's
> making this particular syntax weird. I don't suppose you'd be open to
> the idea of just dropping that though..? There wouldn't be any issue
> with this syntax if we just always had checksums included when a
> manifest is requested. :)
>
> Somehow, I don't think I'm going to win that argument.

Well, it's not a crazy idea. So, at some point, I had the idea that
you were always going to get a manifest, and therefore you should at
least ought to have the option of not checksumming to avoid the
overhead. But, as things stand now, you can suppress the manifest
altogether, so that you can still take a backup even if you've got no
disk space to spool the manifest on the master. So, if you really want
no overhead from manifests, just don't have a manifest. And if you are
OK with some overhead, why not at least have a CRC-32C checksum, which
is, after all, pretty cheap?

Now, on the other hand, I don't have any strong evidence that the
manifest-without-checksums mode is useless. You can still use it to
verify that you have the correct files and that those files have the
expected sizes. And, verifying those things is very cheap, because you
only need to stat() each file, not open and read them all. True, you
can do those things by using pg_validatebackup -s. But, you'd still
incur the (admittedly fairly low) overhead of computing checksums that
you don't intend to use.

This is where I feel like I'm trying to make decisions in a vacuum. If
we had a few more people weighing in on the thread on this point, I'd
be happy to go with whatever the consensus was. If most people think
having both --no-manifest (suppressing the manifest completely) and
--manifest-checksums=none (suppressing only the checksums) is useless
and confusing, then sure, let's rip the latter one out. If most people
like the flexibility, let's keep it: it's already implemented and
tested. But I hate to base the decision on what one or two people
think.

> > Well, the 512MB "limit" for CRC-32C means only that for certain very
> > specific types of errors, detection is not guaranteed above that file
> > size. So if you have a single flipped bit, for example, and the file
> > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
> > chance of detecting the error, whereas if the file size is less than
> > 512MB, it is 100% certain, because of the design of the algorithm. But
> > nine nines is plenty, and neither SHA nor our page-level checksums
> > provide guaranteed error detection properties anyway.
>
> Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
> for exactly what it's designed to be useful for, but we also know that
> we're going to have larger files- at least 1GB ones, and quite possibly
> larger, so why are we choosing this?
>
> At the least, wouldn't it make sense to consider a larger CRC, one whose
> limit is above the size of commonly expected files, if we're going to
> use a CRC?

I mean, you're just repeating the same argument here, and it's just
not valid. Regardless of the file size, the chances of a false
checksum match are literally less than one in a billion. There is
every reason to believe that users will be happy with a low-overhead
method that has a 99.9999999+% chance of detecting corrupt files. I do
agree that a 64-bit CRC would probably be not much more expensive and
improve the probability of detecting errors even further, but I wanted
to restrict this patch to using infrastructure we already have. The
choices there are the various SHA functions (so I supported those),
MD5 (which I deliberately omitted, for reasons I hope you'll be the
first to agree with), CRC-32C (which is fast), a couple of other
CRC-32 variants (which I omitted because they seemed redundant and one
of them only ever existed in PostgreSQL because of a coding mistake),
and the hacked-up version of FNV that we use for page-level checksums
(which is only 16 bits and seems to have no advantages for this
purpose).

> > I'm not sure why the fact that it's a 40-year-old algorithm is
> > relevant. There are many 40-year-old algorithms that are very good.
>
> Sure there are, but there probably wasn't a lot of thought about
> GB-sized files, and this doesn't really seem to be the direction people
> are going in for larger objects. s3, as an example, uses sha256.
> Google, it seems, suggests folks use "HighwayHash" (from their crc32c
> github repo- https://github.com/google/crc32c). Most CRC uses seem to
> be for much smaller data sets.

Again, I really want to stick with infrastructure we already have.
Trying to find a hash function that will please everybody is a hole
with no bottom, or more to the point, a bikeshed in need of painting.
There are TONS of great hash functions out there on the Internet, and
as previous discussions of pgsql-hackers will attest, as soon as you
go down that road, somebody will say "well, what about xxhash" or
whatever, and then you spend the rest of your life trying to figure
out what hash function we could try to commit that is fast and secure
and doesn't have copyright or patent problems. There have been
multiple efforts to introduce such hash functions in the past, and I
think basically all of those have crashed into a brick wall.

I don't think that's because introducing new hash functions is a bad
idea. I think that there are various reasons why it might be a good
idea. For instance, highwayhash purports to be a cryptographic hash
function that is fast enough to replace non-cryptographic hash
functions. It's easy to see why someone might want that, here. For
example, it would be entirely reasonable to copy the backup manifest
onto a USB key and store it in a vault. Later, if you get the USB key
back out of the vault and validate it against the backup, you pretty
much know that none of the data files have been tampered with,
provided that you used a cryptographic hash. So, SHA is a good option
for people who have a USB key and a vault, and a faster cryptographic
might be even better. I don't have any desire to block such proposals,
and I would be thrilled if this work inspires other people to add such
options. However, I also don't want this patch to get blocked by an
interminable argument about which hash functions we ought to use. The
ones we have in core now are good enough for a start, and more can be
added later.

> Sure, there's a good chance we'll need newer algorithms in the future, I
> don't doubt that. On the other hand, if crc32c, or CRC whatever, was
> the perfect answer and no one will ever need something better, then
> what's with folks like Google suggesting something else..?

I have never said that CRC was the perfect answer, and the reason why
Google is suggesting something different is because they wanted a fast
hash (not SHA) that still has cryptographic properties. What I have
said is that using CRC-32C by default means that there is very little
downside as compared with current releases. Backups will not get
slower, and error detection will get better. If you pick any other
default from the menu of options currently available, then either
backups get noticeably slower, or we get less error detection
capability than that option gives us.

> As for folks who are that close to the edge on their backup timing that
> they can't have it slow down- chances are pretty darn good that they're
> not far from ending up needing to find a better solution than
> pg_basebackup anyway. Or they don't need to generate a manifest (or, I
> suppose, they could have one but not have checksums..).

40-50% is a lot more than "if you were on the edge."

> > Well, that'd be wrong, though. It's true that backup_manifest won't
> > have an entry in the manifest, and neither will WAL files, but
> > postgresql.auto.conf will. We'll just skip complaining about it if the
> > checksum doesn't match or whatever. The server generates manifest
> > entries for everything, and the client decides not to pay attention to
> > some of them because it knows that pg_basebackup may have made certain
> > changes that were not known to the server.
>
> Ok, but it's also wrong to say that the backup_label is excluded from
> verification.

The docs don't say that backup_label is excluded from verification.
They do say that backup_manifest is excluded from verification
*against the manifest*, because it is. I'm not sure if you're honestly
confused here or if we're just devolving into arguing for the sake of
argument, but right now the code looks like this:

simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");

Notice that this is the same list of files mentioned in the
documentation. Now let's suppose we remove the first of those lines of
code, so that backup_manifest is not in the exclude list by default.
Now let's try to validate a backup:

[rhaas pgsql]$ src/bin/pg_validatebackup/pg_validatebackup ~/pgslave
pg_validatebackup: error: "backup_manifest" is present on disk but not
in the manifest

Oops. If you read that error carefully, you can see that the complaint
is 100% valid. backup_manifest is indeed present on disk, but not in
the manifest. However, because this situation is expected and known
not to be a problem, the right thing to do is suppress the error. That
is why it is in the ignore_list by default. The documentation is
attempting to explain this. If it's unclear, we should try to make it
better, but it is absolutely NOT saying that there is no internal
validation of the backup_manifest. In fact, the previous paragraph
tries to explain that:

+ <application>pg_validatebackup</application> reads the manifest file of a
+ backup, verifies the manifest against its own internal checksum, and then

It is, however, saying, and *entirely correctly*, that
pg_validatebackup will not check the backup_manifest file against the
backup_manifest. If it did, it would find that it's not there. It
would then emit an error message like the one above even though
there's no problem with the backup.

> I fail to see the usefulness of a tool that doesn't actually verify that
> the backup is able to be restored from.
>
> Even pg_basebackup (in both fetch and stream modes...) checks that we at
> least got all the WAL that's needed for the backup from the server
> before considering the backup to be valid and telling the user that
> there was a successful backup. With what you're proposing here, we
> could have someone do a pg_basebackup, get back an ERROR saying the
> backup wasn't valid, and then run pg_validatebackup and be told that the
> backup is valid. I don't get how that's sensible.

I'm sorry that you can't see how that's sensible, but it doesn't mean
that it isn't sensible. It is totally unrealistic to expect that any
backup verification tool can verify that you won't get an error when
trying to use the backup. That would require that everything that the
validation tool try to do everything that PostgreSQL will try to do
when the backup is used, including running recovery and updating the
data files. Anything less than that creates a real possibility that
the backup will verify good but fail when used. This tool has a much
narrower purpose, which is to try to verify that we (still) have the
files the server sent as part of the backup and that, to the best of
our ability to detect such things, they have not been modified. As you
know, or should know, the WAL files are not sent as part of the
backup, and so are not verified. Other things that would also be
useful to check are also not verified. It would be fantastic to have
more verification tools in the future, but it is difficult to see why
anyone would bother trying if an attempt to get the first one
committed gets blocked because it does not yet do everything. Very few
patches try to do everything, and those that do usually get blocked
because, by trying to do too much, they get some of it badly wrong.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-26 15:44:44 Re: Patch: to pass query string to pg_plan_query()
Previous Message Tom Lane 2020-03-26 15:31:13 Re: pgsql: Provide a TLS init hook