Re: [patch] Fix checksum verification in base backups for zero page headers

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] Fix checksum verification in base backups for zero page headers
Date: 2020-10-20 06:24:32
Message-ID: 20201020062432.GA30362@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 16, 2020 at 06:02:52PM +0300, Anastasia Lubennikova wrote:
> In the current patch, PageIsVerifed called from pgbasebackup simply doesn't
> Should we change this too? I don't see any difference.

I considered that, but now that does not seem worth bothering here.

> Done.

Thanks for the updated patch. I have played with dd and some fake
files to check the validity of the zero-case (as long as pd_lsn does
not go wild). Here are some comments, with an updated patch attached:
- Added a PageIsVerifiedExtended to make this patch back-patchable,
with the original routine keeping its original shape.
- I did not like much to show the WARNING from PageIsVerified() for
the report, and we'd lose some context related to the base backups.
The important part is to know which blocks from which files are found
as problematic.
- Switched the checks to have PageIsVerified() placed first in the
hierarchy, so as we do all the basic validity checks before a look at
the LSN. This is not really change things logically.
- The meaning of block_retry is also the opposite of what it should be
in the original code? IMO, the flag should be set to true if we still
are fine to retry a block, and set it to false once the one-time retry
has failed.
- Moved the hardcoded failure report threshold of 5 into its own
variable.
- The error strings are not really consistent with the project style
in this area. These are usually not spawned across multiple lines to
ease searches with grep or such.

Anastasia, Michael B, does that look OK to you?

NB: This patch fixes only one problem, the zero-page case, as it was
the intention of Michael B to split this part into its own thread. We
still have, of course, a second problem when it comes to a random LSN
put into the page header which could mask an actual checksum failure
so this only takes care of half the issues. Having a correct LSN
approximation is a tricky problem IMO, but we could improve things by
having some delta with an extra WAL page on top of GetInsertRecPtr().
And this function cannot be used for base backups taken from
standbys.
--
Michael

Attachment Content-Type Size
checksums-zeros-v6.patch text/x-diff 12.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-10-20 06:53:29 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Amit Kapila 2020-10-20 05:29:56 Re: Track statistics for streaming of in-progress transactions