Re: backup manifests

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-01-03 12:41:45
Message-ID: CAF1DzPUAuY5HqwzBoyFssuN9eRYmTkAKYLPEika1bOTvV2W1YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for review comments.

On Mon, Dec 30, 2019 at 11:53 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> > To examine the first word of each line, I am using below check:
> > if (strncmp(line, "File", 4) == 0)
> > {
> > ..
> > }
> > else if (strncmp(line, "Manifest-Checksum", 17) == 0)
> > {
> > ..
> > }
> > else
> > error
> >
> > strncmp might be not right here, but we can not put '\0' in between the
> line (to find out first word)
> > before we recognize the line type.
> > All the lines expect line last one (where we have manifest checksum) are
> feed to the checksum machinary to calculate manifest checksum.
> > so update_checksum() should be called after recognizing the type, i.e:
> if it is a File type record. Do you see any issues with this?
>
> I see the problem, but I don't think your solution is right, because
> the first test would pass if the line said FiletMignon rather than
> just File, which we certainly don't want. You've got to write the test
> so that you're checking against the whole first word, not just some
> prefix of it. There are several possible ways to accomplish that, but
> this isn't one of them.
>

Yeah. Fixed in the attached patch.

>
> >> + pg_log_error("invalid record found in \"%s\"", manifest_path);
> >>
> >> Error message needs work.
>
> Looks better now, but you have a messages that say "invalid checksums
> type \"%s\" found in \"%s\"". This is wrong because checksums would
> need to be singular in this context (checksum). Also, I think it could
> be better phrased as "manifest file \"%s\" specifies unknown checksum
> algorithm \"%s\" at line %d".
>

Corrected.

>
> >> Your function names should be consistent with the surrounding style,
> >> and with each other, as far as possible. Three different conventions
> >> within the same patch and source file seems over the top.
>
> This appears to be fixed.
>
> >> Also keep in mind that you're not writing code in a vacuum. There's a
> >> whole file of code here, and around that, a whole project.
> >> scan_data_directory() is a good example of a function whose name is
> >> clearly too generic. It's not a general-purpose function for scanning
> >> the data directory; it's specifically a support function for verifying
> >> a backup. Yet, the name gives no hint of this.
>
> But this appears not to be fixed.
>

I have changed this function name to "VerifyDir" likewise, we have sendDir
and sendFile in basebackup.c

>
> >> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
> >> "/backup_manifest") == 0)
> >> continue;
> >
> > Thanks for the suggestion. Corrected as per the above inputs.
>
> You need a comment here, like "Ignore the possible presence of a
> backup_manifest file and/or a pg_wal directory in the backup being
> verified." and then maybe another sentence explaining why that's the
> right thing to do.
>

Corrected.

>
> + * The forth parameter to VerifyFile() will pass the relative
> path
> + * of file to match exactly with the filename present in
> manifest.
>
> I don't know what this comment is trying to tell me, which might be
> something you want to try to fix. However, I'm pretty sure it's
> supposed to say "fourth" not "forth".
>

I have changed the fourth parameter of VerifyFile(), so my comment over
there is no more valid.

>
> >> and the result would be that everything inside that long if-block is
> >> now at the top level of the function and indented one level less. And
> >> I think if you look at this function you'll see a way that you can
> >> save a *second* level of indentation for much of that code. Please
> >> check the rest of the patch for similar cases, too.
> >
> > Make sense. corrected.
>
> I don't agree. A large chunk of VerifyFile() is still subject to a
> quite unnecessary level of indentation.
>

Yeah, corrected.

>
> > I have added a check for EOF, but not sure whether that woule be right
> here.
> > Do we need to check the length of buffer as well?
>
> That's really, really not right. EOF is not a character that can
> appear in the buffer. It's chosen on purpose to be a value that never
> matches any actual character when both the character and the EOF value
> are regarded as values of type 'int'. That guarantee doesn't apply
> here though because you're dealing with values of type 'char'. So what
> this code is doing is searching for an impossible value using
> incorrect logic, which has very little to do with the actual need
> here, which is to avoid running off the end of the buffer. To see what
> the problem is, try creating a file with no terminating newline, like
> this:
>
> echo -n this file has no terminating newline >> some-file
>
> I doubt it will be very hard to make this patch crash horribly. Even
> if you can't, it seems pretty clear that the logic isn't right.
>
> I don't really know what the \0 tests in NextLine() and NextWord()
> think they're doing either. If there's a \0 in the buffer before you
> add one, it was in the original input data, and pretending like that
> marks a word or line boundary seems like a fairly arbitrary choice.
>
> What I suggest is:
>
> (1) Allocate one byte more than the file size for the buffer that's
> going to hold the file, so that if you write a \0 just after the last
> byte of the file, you don't overrun the allocated buffer.
>
> (2) Compute char *endptr = buf + len.
>
> (3) Pass endptr to NextLine and NextWord and write the loop condition
> something like while (*buf != '\n' && buf < endptr).
>

Thanks for the suggestion. Corrected as per above suggestion.

>
> Other notes:
>
> - The error handling in ReadFileIntoBuffer() does not seem to consider
> the case of a short read. If you look through the source tree, you can
> find examples of how we normally handle that.
>

yeah, corrected.

>
> - Putting string_hash_sdbm() into encode.c seems like a surprising
> choice. What does this have to do with encoding anything? And why is
> it going into src/common at all if it's only intended for frontend
> use?
>
I thought this function can be used in backend as well, i.e: likewise we
are using in simplehash, so kept that in src/common.
After your comment, I have moved this to pg_basebackup.c.
I think this can be kept in common place but not in "srs/common/encode.c"
thoughts?

>
> - It seems like whether or not any problems were found while verifying
> the manifest ought to affect the exit status of pg_basebackup. I'm not
> exactly sure what exit codes ought to be used, but you could look for
> similar precedents. Document this, too.
>
I might be not getting this completely correct, but as per my observation,
if any error occurs, pg_basebackup terminated with exit(1).
Whereas in normal case (without an error), main function returns 0. The
"help" and "version" option terminate normally with exit(0).
So in our case, exit(0) would be appropriate. Please correct me if I
misunderstood anything.

>
> - As much as possible let's have errors in the manifest file report
> the line number, and let's also try to make them more specific, e.g.
> instead of "invalid manifest record found in \"%s\"", perhaps
> "manifest file \"%s\" contains invalid keyword \"%s\" at line %d".
>
yeah, added line number at possible places.

I have also fixed few comments given by Jeevan Chalke offlist.

Please find attached v7 patches and let me know your comments.
--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Attachment Content-Type Size
v7-0001-Backup-manifest-with-file-names-sizes-timestamps-.patch application/octet-stream 39.9 KB
v7-0002-Implementation-of-backup-validator.patch application/octet-stream 17.9 KB
v7-0003-Tap-test-case-patch-to-verify-the-backup-using-ve.patch application/octet-stream 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikael Kjellström 2020-01-03 13:04:10 Re: sidewinder has one failure
Previous Message Fabien COELHO 2020-01-03 12:07:33 Re: pgbench - allow to create partitioned tables