Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2019-12-20 15:43:57
Message-ID: CA+TgmoYY5LXfbmeCO3gpCoCJcqNj0125PO7B7ecuJjnoEdB08A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
<suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> Thank you for review comments.

Thanks for the new version.

+ <term><option>--verify-backup </option></term>

Whitespace.

+struct manifesthash_hash *hashtab;

Uh, I had it in mind that you would nuke this line completely, not
just remove "typedef" from it. You shouldn't need a global variable
here.

+ if (buf == NULL)

pg_malloc seems to have an internal check such that it never returns
NULL. I don't see anything like this test in other callers.

The order of operations in create_manifest_hash() seems unusual:

+ fd = open(manifest_path, O_RDONLY, 0);
+ if (fstat(fd, &stat))
+ buf = pg_malloc(stat.st_size);
+ hashtab = manifesthash_create(1024, NULL);
...
+ entry = manifesthash_insert(hashtab, filename, &found);
...
+ close(fd);

I would have expected open-fstat-read-close to be consecutive, and the
manifesthash stuff all done afterwards. In fact, it seems like reading
the file could be a separate function.

+ if (strncmp(checksum, "SHA256", 6) == 0)

This isn't really right; it would give a false match if we had a
checksum algorithm with a name like SHA2560 or SHA256C or
SHA256ExceptWayBetter. The right thing to do is find the colon first,
and then probably overwrite it with '\0' so that you have a string
that you can pass to parse_checksum_algorithm().

+ /*
+ * we don't have checksum type in the header, so need to
+ * read through the first file enttry to find the checksum
+ * type for the manifest file and initilize the checksum
+ * for the manifest file itself.
+ */

This seems to be proceeding on the assumption that the checksum type
for the manifest itself will always be the same as the checksum type
for the first file in the manifest. I don't think that's the right
approach. I think the manifest should always have a SHA256 checksum,
regardless of what type of checksum is used for the individual files
within the manifest. Since the volume of data in the manifest is
presumably very small compared to the size of the database cluster
itself, I don't think there should be any performance problem there.

+ filesize = atol(size);

Using strtol() would allow for some error checking.

+ * Increase the checksum by its lable length so that we can
+ checksum = checksum + checksum_lable_length;

Spelling.

+ pg_log_error("invalid record found in \"%s\"", manifest_path);

Error message needs work.

+VerifyBackup(void)
+create_manifest_hash(char *manifest_path)
+nextLine(char *buf)

Your function names should be consistent with the surrounding style,
and with each other, as far as possible. Three different conventions
within the same patch and source file seems over the top.

Also keep in mind that you're not writing code in a vacuum. There's a
whole file of code here, and around that, a whole project.
scan_data_directory() is a good example of a function whose name is
clearly too generic. It's not a general-purpose function for scanning
the data directory; it's specifically a support function for verifying
a backup. Yet, the name gives no hint of this.

+verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
+ char relative_path[MAXPGPATH], manifesthash_hash *hashtab)

I think I commented on the use of char[] parameters in my previous review.

+ /* Skip backup manifest file. */
+ if (strcmp(de->d_name, "backup_manifest") == 0)
+ return;

Still looks like this will be skipped at any level of the directory
hierarchy, not just the top. And why are we skipping backup_manifest
here bug pg_wal in scan_data_directory? That's a rhetorical question,
because I know the answer: verify_file() is only getting called for
files, so you can't use it to skip directories. But that's not a good
excuse for putting closely-related checks in different parts of the
code. It's just going to result in the checks being inconsistent and
each one having its own bugs that have to be fixed separately from the
other one, as here. Please try to reorganize this code so that it can
be done in a consistent way.

I think this is related to the way you're traversing the directory
tree, which somehow looks a bit awkward to me. At the top of
scan_data_directory(), you've got code that uses basedir and
subdirpath to construct path and relative_path. I was initially
surprised to see that this was the job of this function, rather than
the caller, but then I thought: well, as long as it makes life easy
for the caller, it's probably fine. However, I notice that the only
non-trivial caller is the scan_data_directory() itself, and it has to
go and construct newsubdirpath from subdirpath and the directory name.

It seems to me that this would get easier if you defined
scan_data_directory() -- or whatever we end up calling it -- to take
two pathname-related arguments:

- basepath, which would be $PGDATA and would never change as we
recurse down, so same as what you're now calling basedir
- pathsuffix, which would be an empty string at the top level and at
each recursive level we'd add a slash and then de->d_name.

So at the top of the function we wouldn't need an if statement,
because you could just do:

snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);

And when you recurse you wouldn't need an if statement either, because
you could just do:

snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);

What I'd suggest is constructing newpathsuffix right after rejecting
"." and ".." entries, and then you can reject both pg_wal and
backup_manifest, at the top-level only, using symmetric and elegant
code:

if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
"/backup_manifest") == 0)
continue;

+ record = manifesthash_lookup(hashtab, filename);;
+ if (record)
+ {
...long block...
+ }
+ else
+ pg_log_info("file \"%s\" is present in backup but not in manifest",
+ filename);

Try to structure the code in such a way that you minimize unnecessary
indentation. For example, in this case, you could instead write:

if (record == NULL)
{
pg_log_info(...)
return;
}

and the result would be that everything inside that long if-block is
now at the top level of the function and indented one level less. And
I think if you look at this function you'll see a way that you can
save a *second* level of indentation for much of that code. Please
check the rest of the patch for similar cases, too.

+static char *
+nextLine(char *buf)
+{
+ while (*buf != '\0' && *buf != '\n')
+ buf = buf + 1;
+
+ return buf + 1;
+}

I'm pretty sure that my previous review mentioned the importance of
protecting against buffer overruns here.

+static char *
+nextWord(char *line)
+{
+ while (*line != '\0' && *line != '\t' && *line != '\n')
+ line = line + 1;
+
+ return line + 1;
+}

Same problem here.

In both cases, ++ is more idiomatic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Lorenz 2019-12-20 16:02:01 Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Previous Message Andreas Karlsson 2019-12-20 15:35:25 Re: More issues with expressions always false (no patch)