Re: [PATCH] Verify Checksums during Basebackups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-04-03 11:52:21
Message-ID: CABUevEz4GCyrOr_o48tN-Yvpu61WN4vTLO2W46TXvRKi5L04Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi!
>
> On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:
> > It might be a micro-optimisation, but ISTM that we shouldn't do the
> > basename(palloc(fn)) in is_heap_file() unless we actually need it -- so
> not
> > at the top of the function. (And surely "." and ".." should not occur
> here?
> > I think that's a result of copy/paste from the online checksum patch?
> >
> > We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
> > find a way to reuse that duplication? Perhaps we want to split it into
> the
> > two pieces already out in sendFile() and pass it in as separate
> parameters?
>
> I've done the latter now, although it looks a bit weird that the second
> argument data is a subset of the first. I couldn't find another way to
> not have it done twice, though.
>

I agree, but I think it's still cleaner.

On further look, there is actually no need to pstrdup() at all -- we never
used the modified part of the string anyway, because we don't care about
the oid (unlike pg_verify_checksums).

So I adjusted the patch by that.

> If not then this second one should at least only be done inside the if
> > (verify_checksums).
>
> We can't have both, as we need to call the is_heap_file() function in
> order to determine whether we should verify the checksums.
>

Right. I realize that -- thus the "if not". But I guess I was not clear in
what I meant -- see attached file for it.

> There is a bigger problem next to that though -- I believe basename()
> does
> > not exist on Win32. I haven't tested it, but there is zero documentation
> of
> > it existing, which usually indicates it doesn't. That's the part that
> > definitely needs to get fixed.
> >
> > I think you need to look into the functionality in port/path.c, in
> > particular last_dir_separator()?
>
> Thanks for the pointer, I've used that now; I mentioned before that
> basename() might be a portability hazard, but couldn't find a good
> substitute myself.
>

Yeah, I have a recollection somewhere of running into this before, but I
couldn't find any references. But the complete lack of docs about it on
msdn.microsoft.com is a clear red flag :)

>
> V6 of the patch is attached.
>
>
Excellent. I've done some mangling on it:

* Changed the is_heap_file to is_checksummed_file (and the associtaed
struct name), because this is really what it's about (we for example verify
checksums on indexes, which are clearly not heaps)
* Moved the list of files to the top of the file next to the other lists of
files/directories
* Added missing function prototype at the top, and changed the parameter
names to be a bit more clear
* Added some comments
* Changed the logic around the checksum-check to avoid the pstrdup() and to
not call the path functions unless necessary (per comment above)
* "filen" -> "file" in message
* xlog.h does not need to be included
* pgindent

Remaining question:

The check for (cnt % BLCKSZ != 0) currently does "continue", which means
that this block of data isn't actually sent to the client at all, which
seems completely wrong. We only want to prevent checksum validations.

I have moved the check up a bit, and refactored it so it continues to do
the actual transmission of the file if this path is hit.

I have pushed an updated patch with those changes. Please review the result
and let me know I broke something :)

Thanks!!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-04-03 11:52:26 pgsql: Validate page level checksums in base backups
Previous Message Dmitry Ivanov 2018-04-03 11:28:37 Re: new function for tsquery creartion