Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-24 18:04:24
Message-ID: CA+TgmoYdFx_Hm4TUmc2qxLvaUo1KXH2yuxBk1RrgT5Oj+PwvNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> While I get the desire to have a default here that includes checksums,
> the way the command is structured, it strikes me as odd that the lack of
> MANIFEST_CHECKSUMS in the command actually results in checksums being
> included.

I don't think that's quite accurate, because the default for the
MANIFEST option is 'no', so the actual default if you say nothing
about manifests at all, you don't get one. However, it is true that if
you ask for a manifest and you don't specify the type of checksums,
you get CRC-32C. We could change it so that if you ask for a manifest
you must also specify the type of checksum, but I don't see any
advantage in that approach. Nothing prevents the client from
specifying the value if it cares, but making the default "I don't
care, you pick" seems pretty sensible. It could be really helpful if,
for example, we decide to remove the initial default in a future
release for some reason. Then the client just keeps working without
needing to change anything, but anyone who explicitly specified the
old default gets an error.

> > + Disables generation of a backup manifest. If this option is not
> > + specified, the server will and send generate a backup manifest
> > + which can be verified using <xref linkend="app-pgvalidatebackup" />.
> > + </para>
> > + </listitem>
> > + </varlistentry>
>
> How about "If this option is not specified, the server will generate and
> send a backup manifest which can be verified using ..."

Good suggestion. :-)

> > + <varlistentry>
> > + <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
> > + <listitem>
> > + <para>
> > + Specifies the algorithm that should be used to checksum each file
> > + for purposes of the backup manifest. Currently, the available
> > + algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
> > + <literal>SHA224</literal>, <literal>SHA256</literal>,
> > + <literal>SHA384</literal>, and <literal>SHA512</literal>.
> > + The default is <literal>CRC32C</literal>.
> > + </para>
>
> As I recall, there was an invitation to argue about the defaults at one
> point, and so I'm going to say here that I would advocate for a
> different default than 'crc32c'. Specifically, I would think sha256 or
> 512 would be better. I don't recall seeing a debate about this that
> conclusively found crc32c to be better, but I'm happy to go back and
> reread anything someone wants to point me at.

It was discussed upthread. Andrew Dunstan argued that there was no
reason to use a cryptographic checksum here and that we shouldn't do
so gratuitously. Suraj Kharage found that CRC-32C has very little
performance impact but that any of the SHA functions slow down backups
considerably. David Steele pointed out that you'd need a better
checksum if you wanted to use it for purposes such as delta restore,
with which I agree, but that's not the design center for this feature.
I concluded that different people wanted different things, so that we
ought to make this configurable, but that CRC-32C is a good default.
It has approximately a 99.9999999767169% chance of detecting a random
error, which is pretty good, and it doesn't drastically slow down
backups, which is also good.

> It also seems a bit silly to me that using the defaults means having to
> deal with two different algorithms- crc32c and sha256. Considering how
> fast these algorithms are, compared to everything else involved in a
> backup (particularly one that's likely going across a network...), I
> wonder if we should say "may slightly increase" above.

Actually, Suraj's results upthread show that it's a pretty big hit.

> > + <para>
> > + On the other hand, <literal>CRC32C</literal> is not a cryptographic
> > + hash function, so it is only suitable for protecting against
> > + inadvertent or random modifications to a backup. An adversary
> > + who can modify the backup could easily do so in such a way that
> > + the CRC does not change, whereas a SHA collision will be hard
> > + to manufacture. (However, note that if the attacker also has access
> > + to modify the backup manifest itself, no checksum algorithm will
> > + provide any protection.) An additional advantage of the
> > + <literal>SHA</literal> family of functions is that they output
> > + a much larger number of bits.
> > + </para>
>
> I'm not really sure that this paragraph is sensible to include.. We
> certainly don't talk about adversaries and cryptographic hash functions
> when we talk about our page-level checksums, for example. I'm not
> completely against including it, but I don't want to give the impression
> that this is something we routinely consider or that lack of discussion
> elsewhere implies we have protections against a determined attacker.

Given the skepticism from some quarters about CRC-32C on this thread,
I didn't want to oversell it. Also, I do think that these things are
possibly things that we should consider more widely. I agree with
Andrew's complaint that it's far too easy to just throw SHA<lots> at
problems that don't really require it without any actually good
reason. Spelling out our reasons for choosing certain algorithms for
certain purposes seems like a good habit to get into, and if we
haven't done it in other places, maybe we should. On the other hand,
while I'm inclined to keep this paragraph, I won't lose much sleep if
we decide to remove it.

> > + <refnamediv>
> > + <refname>pg_validatebackup</refname>
> > + <refpurpose>verify the integrity of a base backup of a
> > + <productname>PostgreSQL</productname> cluster</refpurpose>
> > + </refnamediv>
>
> "verify the integrity of a backup taken using pg_basebackup"

OK.

> This seems to invite the idea that pg_validatebackup should be able to
> work with external backup solutions- but I'm a bit concerned by that
> idea because it seems like it would then mean we'd have to be
> particularly careful when changing things in this area, and I'm not
> thrilled by that. I'd like to make sure that new versions of
> pg_validatebackup work with older backups, and, ideally, older versions
> of pg_validatebackup would work even with newer backups, all of which I
> think the json structure of the manifest helps us with, but that's when
> we're building the manifest and know what it's going to look like.

Both you and David made forceful arguments that this needed to be JSON
rather than an ad-hoc text format precisely so that other tools could
parse it more easily, and I just spent *a lot* of time making the JSON
parsing stuff work precisely so that you could have that. This project
would've been done a month ago if not for that. I don't care all that
much whether we remove the mention here, but the idea that using JSON
was so that pg_validatebackup could manage compatibility issues is
just not correct. The version number on line 1 of the file was more
than sufficient for that purpose.

> > + <para>
> > + <application>pg_validatebackup</application> reads the manifest file of a
> > + backup, verifies the manifest against its own internal checksum, and then
> > + verifies that the same files are present in the target directory as in the
> > + manifest itself. It then verifies that each file has the expected checksum,
> > + unless the backup was taken the checksum algorithm set to
>
> "was taken with the checksum algorithm"...

Oops. Will fix.

> > + <itemizedlist>
> > + <listitem>
> > + <para>
> > + <literal>backup_manifest</literal> is ignored because the backup
> > + manifest is logically not part of the backup and does not include
> > + any entry for itself.
> > + </para>
> > + </listitem>
>
> This seems a bit confusing, doesn't it? The backup_manifest must exist,
> and its checksum is internal, and is checked, isn't it? Why say that
> it's excluded..?

Well, there's no entry in the backup manifest for backup_manifest
itself. Normally, the presence of a file not mentioned in
backup_manifest would cause a complaint about an extra file, but
because backup_manifest is in the ignore list, it doesn't.

> > + <listitem>
> > + <para>
> > + <literal>pg_wal</literal> is ignored because WAL files are sent
> > + separately from the backup, and are therefore not described by the
> > + backup manifest.
> > + </para>
> > + </listitem>
>
> I don't agree with the choice to exclude the WAL files, considering
> they're an integral part of a backup, to exclude them means that if
> they've been corrupted at all then the entire backup is invalid. You
> don't want to be discovering that when you're trying to do a restore of
> a backup that you took with pg_basebackup and which pg_validatebackup
> says is valid. After all, the tool being used here, pg_basebackup,
> *does* also stream the WAL files- there's no reason why we can't
> calculate a checksum on them and store that checksum somewhere and use
> it to validate the WAL files. This, in my opinion, is actually a
> show-stopper for this feature. Claiming it's a valid backup when we
> don't check the absolutely necessary-for-restore WAL is making a false
> claim, no matter how well it's documented.

The default for pg_basebackup is -Xstream, which means that the WAL
files are being sent over a separate connection that has no connection
to the original session. The server, when generating the backup
manifest, has no idea what WAL files are being sent over that separate
connection, and thus cannot include them in the manifest. This problem
could be "solved" by having the client generate the manifest rather
than the server, but I think that cure would be worse than the
disease. As it stands, the manifest provides some protection against
transmission errors, which would be lost with that design. As you
point out, this clearly can't be done with -Xnone. I think it would be
possible to support this with -Xfetch, but we'd have to have the
manifest itself specify whether or not it included files in pg_wal,
which would require complicating the format a bit. I don't think that
makes sense. I assume -Xstream is the most commonly-used mode, because
the default used to be -Xfetch and we changed it, which I think we
would not have done unless people liked -Xstream significantly better.
Adding complexity to cater to a non-default case which I suspect is
not widely used doesn't really make sense to me.

In the future, we might want to consider improvements which could make
validation of pg_wal feasible in common cases. Specifically, suppose
that pg_basebackup could receive the manifest from the server, keep
all the entries for the existing files just as they are, but add
entries for WAL files and anything else it may have added to the
backup, recompute the manifest checksum, and store the resulting
revised manifest with the backup. That, I think, would be fairly cool,
but it's a significant body of additional development work, and this
is already quite a large patch. The patch itself has grown to about
3000 lines, and has already 10 preparatory commits doing another ~1500
lines of refactoring to prepare for it.

> Not really thrilled with this (pg_basebackup certainly could figure out
> the checksum for those files...), but I also don't think it's a huge
> issue as they can be recreated by a user (unlike a WAL file..).

Yeah, same issues, though. Here again, there are several possible
fixes: (1) make the server modify those files rather than letting
pg_basebackup do it; (2) make the client compute the manifest rather
than the server; (3) have the client revise the manifest. (3) makes
most sense to me, but I think that it would be better to return to
that topic at a later date. This is certainly not a perfect feature as
things stand but I believe it is good enough to provide significant
benefits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-03-24 18:26:52 Re: Add A Glossary
Previous Message Etsuro Fujita 2020-03-24 18:03:11 Re: [HACKERS] advanced partition matching algorithm for partition-wise join