Re: [PATCH] Verify Checksums during Basebackups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-02 11:23:58
Message-ID: CABUevEyTJTvn328B6Jb=LdZFZJE6p0MPT=HivSXs-KbxGpqrGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 28, 2018 at 7:08 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> some installations have data which is only rarerly read, and if they are
> so large that dumps are not routinely taken, data corruption would only
> be detected with some large delay even with checksums enabled.
>

I think this is a very common scenario. Particularly when you take into
account indexes and things like that.

The attached small patch verifies checksums (in case they are enabled)
> during a basebackup. The rationale is that we are reading every block in
> this case anyway, so this is a good opportunity to check them as well.
> Other and complementary ways of checking the checksums are possible of
> course, like the offline checking tool that Magnus just submitted.
>
> It probably makes sense to use the same approach for determining the
> segment numbers as Magnus did in his patch, or refactor that out in a
> utility function, but I'm sick right now so wanted to submit this for
> v11 first.
>
> I did some light benchmarking and it seems that the performance
> degradation is minimal, but this could well be platform or
> architecture-dependent. Right now, the checksums are always checked but
> maybe this could be made optional, probably by extending the replication
> protocol.
>

I think it should be.

I think it would also be a good idea to have this a three-mode setting,
with "no check", "check and warning", "check and error". Where "check and
error" should be the default, but you could turn off that in "save whatever
is left mode". But I think it's better if pg_basebackup simply fails on a
checksum error, because that will make it glaringly obvious that there is a
problem -- which is the main point of checksums in the first place. And
then an option to turn it off completely in cases where performance is the
thing.

Another quick note -- we need to assert that the size of the buffer is
actually divisible by BLCKSZ. I don't think it's a common scenario, but it
could break badly if somebody changes BLCKSZ. Either that or perhaps just
change the TARSENDSIZE to be a multiple of BLCKSZ.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-03-02 11:24:59 Re: [HACKERS] path toward faster partition pruning
Previous Message Amit Langote 2018-03-02 11:21:21 Re: [HACKERS] path toward faster partition pruning