Re: backup manifests

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-30 23:24:08
Message-ID: b48e45f8-ba01-2f7b-aa53-e5a8f639e94d@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/30/20 4:16 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 3:51 PM David Steele <david(at)pgmasters(dot)net> wrote:
>
>> > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27
>> 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
>>
>> Storing the checksum type with each file seems pretty redundant.
>> Perhaps that could go in the header? You could always override if a
>> specific file had a different checksum type, though that seems unlikely.
>>
>> In general it might be good to go with shorter keys: "mod", "chk", etc.
>> Manifests can get pretty big and that's a lot of extra bytes.
>>
>> I'm also partial to using epoch time in the manifest because it is
>> generally easier for programs to work with. But, human-readable doesn't
>> suck, either.
>
> It doesn't seem impossible for it to come up; for example, consider a
> file-level incremental backup facility. You might retain whatever
> checksums you have for the unchanged files (to avoid rereading them)
> and add checksums for modified or added files.

OK.

> I am not convinced that minimizing the size of the file here is a
> particularly important goal, because I don't think it's going to get
> that big in normal cases. I also think having the keys and values be
> easily understandable by human being is a plus. If we really want a
> minimal format without redundancy, we should've gone with what I
> proposed before (though admittedly that could've been tamped down even
> further if we'd cared to squeeze, which I didn't think was important
> then either).

Well, normal cases is the key. But fine, in general we have found that
the in memory representation is more important in terms of supporting
clusters with very large numbers of files.

>> When I ran pg_validatebackup I expected to use -D to specify the backup
>> dir since pg_basebackup does. On the other hand -D is weird because I
>> *really* expect that to be the pg data dir.
>>
>> But, do we want this to be different from pg_basebackup?
>
> I think it's pretty distinguishable, because pg_basebackup needs an
> input (server) and an output (directory), whereas pg_validatebackup
> only needs one. I don't really care if we want to change it, but I was
> thinking of this as being more analogous to, say, pg_resetwal.
> Granted, that's a danger-don't-use-this tool and this isn't, but I
> don't think we want the -D-is-optional behavior that tools like pg_ctl
> have, because having a tool that isn't supposed to be used on a
> running cluster default to $PGDATA seems inadvisable. And if the
> argument is mandatory then it's not clear to me why we should make
> people type -D in front of it.

Honestly I think pg_basebackup is the confusing one, because in most
cases -D points at the running cluster dir. So, OK.

>> > + checksum_length = checksum_string_length / 2;
>>
>> This check is defeated if a single character is added the to checksum.
>>
>> Not too big a deal since you still get an error, but still.
>
> I don't see what the problem is here. We speculatively divide by two
> and allocate memory assuming the value that it was even, but then
> before doing anything critical we bail out if it was actually odd.
> That's harmless. We could get around it by saying:
>
> if (checksum_string_length % 2 != 0)
> context->error_cb(...);
> checksum_length = checksum_string_length / 2;
> checksum_payload = palloc(checksum_length);
> if (!hexdecode_string(...))
> context->error_cb(...);
>
> ...but that would be adding additional code, and error messages, for
> what's basically a can't-happen-unless-the-user-is-messing-with-us
> case.

Sorry, pasted the wrong code and even then still didn't get it quite
right.

The problem:

If I remove an even characters from a checksum it appears the checksum
passes but the manifest checksum fails:

$ pg_basebackup -D test/backup5 --manifest-checksums=SHA256

$ vi test/backup5/backup_manifest
* Remove two characters from the checksum of backup_label

$ pg_validatebackup test/backup5

pg_validatebackup: fatal: manifest checksum mismatch

But if I add any number of characters or remove an odd number of
characters I get:

pg_validatebackup: fatal: invalid checksum for file "backup_label":
"a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fXX"

and no manifest checksum failure.

>> > + * Verify that the manifest checksum is correct.
>>
>> This is not working the way I would expect -- I could freely modify the
>> manifest without getting a checksum error on the manifest. For example:
>>
>> $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
>> pg_validatebackup: fatal: invalid checksum for file "backup_label":
>> "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
>>
>> So, if I deleted the entry above, I got a manifest checksum error. But
>> if I just modified the checksum I get a file checksum error with no
>> manifest checksum error.
>>
>> I would prefer a manifest checksum error in all cases where it is wrong,
>> unless --exit-on-error is specified.
>
> I think I would too, but I'm confused as to what you're doing, because
> if I just modified the manifest -- by deleting a file, for example, or
> changing the checksum of a file, I just get:
>
> pg_validatebackup: fatal: manifest checksum mismatch
>
> I'm confused as to why you're not seeing that. What's the exact
> sequence of steps?

$ pg_basebackup -D test/backup5 --manifest-checksums=SHA256

$ vi test/backup5/backup_manifest
* Add 'X' to the checksum of backup_label

$ pg_validatebackup test/backup5
pg_validatebackup: fatal: invalid checksum for file "backup_label":
"a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"

No mention of the manifest checksum being invalid. But if I remove the
backup label file from the manifest:

pg_validatebackup: fatal: manifest checksum mismatch

--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-30 23:31:59 Re: Corruption during WAL replay
Previous Message Alvaro Herrera 2020-03-30 23:03:27 Re: [HACKERS] Restricting maximum keep segments by repslots