Re: [PATCH] Verify Checksums during Basebackups

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-05 21:19:49
Message-ID: 66ee3722-54bb-6c63-46b5-7acb326b876d@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

On 3/5/18 6:36 AM, Stephen Frost wrote:
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
>
>> So I guess this would have to be sent back via the replication protocol,
>> but I don't see an off-hand way to do this easily?
>
> The final ordinary result set could be extended to include the
> information about checksum failures..? I'm a bit concerned about what
> to do when there are a lot of checksum failures though.. Ideally, you'd
> identify all of the pages in all of the files where a checksum failed
> (just throwing an error such as the one proposed above is really rather
> terrible since you have no idea what block, or even what table, failed
> the checksum...).

I agree that knowing the name of the file that failed validation is
really important, with a list of the pages that failed validation being
a nice thing to have as well, though I would be fine having the latter
added in a future version.

For instance, in pgBackRest we output validation failures this way:

[from a regression test]
WARN: invalid page checksums found in file
[TEST_PATH]/db-primary/db/base/base/32768/33001 at pages 0, 3-5, 7

Note that we collate ranges of errors to keep the output from being too
overwhelming.

I think the file names are very important because there's a rather large
chance that corruption may happen in an index, unlogged table, or
something else that can be rebuilt or reloaded. Knowing where the
corruption is can save a lot of headaches.

> Reviewing the original patch and considering this issue, I believe there
> may be a larger problem- while very unlikely, there's been concern that
> it's possible to read a half-written page (and possibly only the second
> half) and end up with a checksum failure due to that. In pgBackRest, we
> address that by doing another read of the page and by checking the LSN
> vs. where we started the backup (if the LSN is more recent than when the
> backup started then we don't have to care about the page- it'll be in
> the WAL).

The need to reread pages can be drastically reduced by skipping
validation of any page that has an LSN >= the backup start LSN because
they will be replayed from WAL during recovery.

The rereads are still necessary because of the possible transposition of
page read vs. page write as Stephen notes above. We have not been able
to reproduce this case but can't discount it.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-03-05 21:19:52 Re: JIT compiling with LLVM v11
Previous Message Robert Haas 2018-03-05 20:59:36 Re: [HACKERS] Partition-wise aggregation/grouping