Re: [patch] Fix checksum verification in base backups for zero page headers

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] Fix checksum verification in base backups for zero page headers
Date: 2020-10-07 08:18:52
Message-ID: 20201007081852.GC30037@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 25, 2020 at 08:53:27AM +0200, Michael Banck wrote:
> Oh right, I've fixed up the commit message in the attached V4.

Not much a fan of what's proposed here, for a couple of reasons:
- If the page is not new, we should check if the header is sane or
not.
- It may be better to actually count checksum failures if these are
repeatable in a given block, and report them.
- The code would be more simple with a "continue" added for a block
detected as new or with a LSN newer than the backup start.
- The new error messages are confusing, and I think that these are
hard to translate in a meaningful way.

So I think that we should try to use PageIsVerified() directly in the
code path of basebackup.c, and this requires a couple of changes to
make the routine more extensible:
- Addition of a dboid argument for pgstat_report_checksum_failure(),
whose call needs to be changed to use the *_in_db() flavor.
- Addition of an option to decide if a log should be generated or
not.
- Addition of an option to control if a checksum failure should be
reported to pgstat or not, even if this is actually linked to the
previous point, I have seen cases where being able to control both
separately would be helpful, particularly the log part.

For the last two ones, I would use an extensible argument based on
bits32 with a set of flags that the caller can set at will. Something
else I would recommend here is to use an error context for the case
where we want to log a retry number in the base backup loop that
accepts false positives, with the blkno and the physical file name
when we find failures after retries.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2020-10-07 08:25:44 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Thomas Munro 2020-10-07 08:11:46 Re: Add a log message on recovery startup before syncing datadir