Re: backup manifests

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2019-12-23 04:32:28
Message-ID: CAGPqQf0duDNnF0Pidy5VDP7_MvRn_v_qX97rP=QR3VFLO3=3=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> > Thank you for review comments.
>
> Thanks for the new version.
>
> + <term><option>--verify-backup </option></term>
>
> Whitespace.
>
> +struct manifesthash_hash *hashtab;
>
> Uh, I had it in mind that you would nuke this line completely, not
> just remove "typedef" from it. You shouldn't need a global variable
> here.
>
> + if (buf == NULL)
>
> pg_malloc seems to have an internal check such that it never returns
> NULL. I don't see anything like this test in other callers.
>
> The order of operations in create_manifest_hash() seems unusual:
>
> + fd = open(manifest_path, O_RDONLY, 0);
> + if (fstat(fd, &stat))
> + buf = pg_malloc(stat.st_size);
> + hashtab = manifesthash_create(1024, NULL);
> ...
> + entry = manifesthash_insert(hashtab, filename, &found);
> ...
> + close(fd);
>
> I would have expected open-fstat-read-close to be consecutive, and the
> manifesthash stuff all done afterwards. In fact, it seems like reading
> the file could be a separate function.
>
> + if (strncmp(checksum, "SHA256", 6) == 0)
>
> This isn't really right; it would give a false match if we had a
> checksum algorithm with a name like SHA2560 or SHA256C or
> SHA256ExceptWayBetter. The right thing to do is find the colon first,
> and then probably overwrite it with '\0' so that you have a string
> that you can pass to parse_checksum_algorithm().
>
> + /*
> + * we don't have checksum type in the header, so need to
> + * read through the first file enttry to find the checksum
> + * type for the manifest file and initilize the checksum
> + * for the manifest file itself.
> + */
>
> This seems to be proceeding on the assumption that the checksum type
> for the manifest itself will always be the same as the checksum type
> for the first file in the manifest. I don't think that's the right
> approach. I think the manifest should always have a SHA256 checksum,
> regardless of what type of checksum is used for the individual files
> within the manifest. Since the volume of data in the manifest is
> presumably very small compared to the size of the database cluster
> itself, I don't think there should be any performance problem there.
>

Agree, that performance won't be a problem, but that will be bit confusing
to the user. As at the start user providing the manifest-checksum (assume
that user-provided CRC32C) and at the end, user will find the SHA256
checksum string in the backup_manifest file.

Does this also means that irrespective of whether user provided a checksum
option or not, we will be always generating the checksum for the
backup_manifest file?

> + filesize = atol(size);
>
> Using strtol() would allow for some error checking.
>
> + * Increase the checksum by its lable length so that we can
> + checksum = checksum + checksum_lable_length;
>
> Spelling.
>
> + pg_log_error("invalid record found in \"%s\"", manifest_path);
>
> Error message needs work.
>
> +VerifyBackup(void)
> +create_manifest_hash(char *manifest_path)
> +nextLine(char *buf)
>
> Your function names should be consistent with the surrounding style,
> and with each other, as far as possible. Three different conventions
> within the same patch and source file seems over the top.
>
> Also keep in mind that you're not writing code in a vacuum. There's a
> whole file of code here, and around that, a whole project.
> scan_data_directory() is a good example of a function whose name is
> clearly too generic. It's not a general-purpose function for scanning
> the data directory; it's specifically a support function for verifying
> a backup. Yet, the name gives no hint of this.
>
> +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
> + char relative_path[MAXPGPATH], manifesthash_hash *hashtab)
>
> I think I commented on the use of char[] parameters in my previous review.
>
> + /* Skip backup manifest file. */
> + if (strcmp(de->d_name, "backup_manifest") == 0)
> + return;
>
> Still looks like this will be skipped at any level of the directory
> hierarchy, not just the top. And why are we skipping backup_manifest
> here bug pg_wal in scan_data_directory? That's a rhetorical question,
> because I know the answer: verify_file() is only getting called for
> files, so you can't use it to skip directories. But that's not a good
> excuse for putting closely-related checks in different parts of the
> code. It's just going to result in the checks being inconsistent and
> each one having its own bugs that have to be fixed separately from the
> other one, as here. Please try to reorganize this code so that it can
> be done in a consistent way.
>
> I think this is related to the way you're traversing the directory
> tree, which somehow looks a bit awkward to me. At the top of
> scan_data_directory(), you've got code that uses basedir and
> subdirpath to construct path and relative_path. I was initially
> surprised to see that this was the job of this function, rather than
> the caller, but then I thought: well, as long as it makes life easy
> for the caller, it's probably fine. However, I notice that the only
> non-trivial caller is the scan_data_directory() itself, and it has to
> go and construct newsubdirpath from subdirpath and the directory name.
>
> It seems to me that this would get easier if you defined
> scan_data_directory() -- or whatever we end up calling it -- to take
> two pathname-related arguments:
>
> - basepath, which would be $PGDATA and would never change as we
> recurse down, so same as what you're now calling basedir
> - pathsuffix, which would be an empty string at the top level and at
> each recursive level we'd add a slash and then de->d_name.
>
> So at the top of the function we wouldn't need an if statement,
> because you could just do:
>
> snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);
>
> And when you recurse you wouldn't need an if statement either, because
> you could just do:
>
> snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);
>
> What I'd suggest is constructing newpathsuffix right after rejecting
> "." and ".." entries, and then you can reject both pg_wal and
> backup_manifest, at the top-level only, using symmetric and elegant
> code:
>
> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
> "/backup_manifest") == 0)
> continue;
>
> + record = manifesthash_lookup(hashtab, filename);;
> + if (record)
> + {
> ...long block...
> + }
> + else
> + pg_log_info("file \"%s\" is present in backup but not in manifest",
> + filename);
>
> Try to structure the code in such a way that you minimize unnecessary
> indentation. For example, in this case, you could instead write:
>
> if (record == NULL)
> {
> pg_log_info(...)
> return;
> }
>
> and the result would be that everything inside that long if-block is
> now at the top level of the function and indented one level less. And
> I think if you look at this function you'll see a way that you can
> save a *second* level of indentation for much of that code. Please
> check the rest of the patch for similar cases, too.
>
> +static char *
> +nextLine(char *buf)
> +{
> + while (*buf != '\0' && *buf != '\n')
> + buf = buf + 1;
> +
> + return buf + 1;
> +}
>
> I'm pretty sure that my previous review mentioned the importance of
> protecting against buffer overruns here.
>
> +static char *
> +nextWord(char *line)
> +{
> + while (*line != '\0' && *line != '\t' && *line != '\n')
> + line = line + 1;
> +
> + return line + 1;
> +}
>
> Same problem here.
>
> In both cases, ++ is more idiomatic.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Rushabh Lathia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-12-23 05:34:34 Should we rename amapi.h and amapi.c?
Previous Message Michael Paquier 2019-12-23 03:45:00 Re: Proposal: Add more compile-time asserts to expose inconsistencies.