Re: [PATCH] Verify Checksums during Basebackups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-22 14:07:36
Message-ID: CABUevEyfwOg97PZXvw-wOGLSXK5ZHxx=Nx4CqozoDdUj7Rd-aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > Possibly open questions:
> >
> > 1. I have not so far changed the replication protocol to make verifying
> > checksums optional. I can go about that next if the consensus is that we
> > need such an option (and cannot just check it everytime)?
>
> I think most people (including those I had off-list discussions about
> this with) were of the opinion that such an option should be there, so I
> added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> replication command and also an option -k / --no-verify-checksums to
> pg_basebackup to trigger this.
>
> Updated patch attached.
>
>
Notes:

+ if (checksum_failure == true)

Really, just if(checksum_failure)

+ errmsg("checksum mismatch during basebackup")));

Should be "base backup" in messages.

+static const char *skip[] = {

I think that needs a much better name than just "skip". Skip for what? In
particular since we are just skipping it for checksums, and not for the
actual basebackup, that name is actively misinforming.

+ filename = basename(pstrdup(readfilename));
+ if (!noverify_checksums && DataChecksumsEnabled() &&
+ !skipfile(filename) &&
+ (strncmp(readfilename, "./global/", 9) == 0 ||
+ strncmp(readfilename, "./base/", 7) == 0 ||
+ strncmp(readfilename, "/", 1) == 0))
+ verify_checksum = true;

I would include the checks for global, base etc into the skipfile()
function as well (also renamed).

+ * Only check pages which have not been modified since the
+ * start of the base backup.

I think this needs a description of why, as well (without having read this
thread, this is a pretty subtle case).

+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000',
'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";

This part of the test will surely fail on Windows, not having a /dev/zero.
Can we easily implement this part natively in perl perhaps? Somebody who
knows more about which functionality is OK to use within this system can
perhaps comment?

Most of that stuff is trivial and can be cleaned up at commit time. Do you
want to send an updated patch with a few of those fixes, or should I clean
it?

The test thing is a stopper until we figure that one out though. And while
at it -- it seems we don't have any tests for the checksum feature in
general. It would probably make sense to consider that at the same time as
figuring out the right way to do this one.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-03-22 14:19:12 Re: missing support of named convention for procedures
Previous Message John Naylor 2018-03-22 14:03:29 Re: WIP: a way forward on bootstrap data