Re: [PATCH] Verify Checksums during Basebackups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-04-03 11:59:07
Message-ID: CABUevEy-abw5HyjpQvJJQKEv3r46wGHesgfqxqpYrhucSKnbyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 3, 2018 at 1:52 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

>
>
> On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
> wrote:
>
>> Hi!
>>
>> On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:
>> > It might be a micro-optimisation, but ISTM that we shouldn't do the
>> > basename(palloc(fn)) in is_heap_file() unless we actually need it -- so
>> not
>> > at the top of the function. (And surely "." and ".." should not occur
>> here?
>> > I think that's a result of copy/paste from the online checksum patch?
>> >
>> > We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
>> > find a way to reuse that duplication? Perhaps we want to split it into
>> the
>> > two pieces already out in sendFile() and pass it in as separate
>> parameters?
>>
>> I've done the latter now, although it looks a bit weird that the second
>> argument data is a subset of the first. I couldn't find another way to
>> not have it done twice, though.
>>
>
> I agree, but I think it's still cleaner.
>
> On further look, there is actually no need to pstrdup() at all -- we never
> used the modified part of the string anyway, because we don't care about
> the oid (unlike pg_verify_checksums).
>
> So I adjusted the patch by that.
>
>
> > If not then this second one should at least only be done inside the if
>> > (verify_checksums).
>>
>> We can't have both, as we need to call the is_heap_file() function in
>> order to determine whether we should verify the checksums.
>>
>
> Right. I realize that -- thus the "if not". But I guess I was not clear in
> what I meant -- see attached file for it.
>
>
> > There is a bigger problem next to that though -- I believe basename()
>> does
>> > not exist on Win32. I haven't tested it, but there is zero
>> documentation of
>> > it existing, which usually indicates it doesn't. That's the part that
>> > definitely needs to get fixed.
>> >
>> > I think you need to look into the functionality in port/path.c, in
>> > particular last_dir_separator()?
>>
>> Thanks for the pointer, I've used that now; I mentioned before that
>> basename() might be a portability hazard, but couldn't find a good
>> substitute myself.
>>
>
> Yeah, I have a recollection somewhere of running into this before, but I
> couldn't find any references. But the complete lack of docs about it on
> msdn.microsoft.com is a clear red flag :)
>
>
>
>>
>> V6 of the patch is attached.
>>
>>
> Excellent. I've done some mangling on it:
>
> * Changed the is_heap_file to is_checksummed_file (and the associtaed
> struct name), because this is really what it's about (we for example verify
> checksums on indexes, which are clearly not heaps)
> * Moved the list of files to the top of the file next to the other lists
> of files/directories
> * Added missing function prototype at the top, and changed the parameter
> names to be a bit more clear
> * Added some comments
> * Changed the logic around the checksum-check to avoid the pstrdup() and
> to not call the path functions unless necessary (per comment above)
> * "filen" -> "file" in message
> * xlog.h does not need to be included
> * pgindent
>
> Remaining question:
>
> The check for (cnt % BLCKSZ != 0) currently does "continue", which means
> that this block of data isn't actually sent to the client at all, which
> seems completely wrong. We only want to prevent checksum validations.
>
> I have moved the check up a bit, and refactored it so it continues to do
> the actual transmission of the file if this path is hit.
>
>
And of course I forgot that particular part in the first push, so I've
pushed it as a separate commit.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-03 12:02:50 Re: [HACKERS] path toward faster partition pruning
Previous Message Arthur Zakirov 2018-04-03 11:57:22 Re: [PROPOSAL] Shared Ispell dictionaries