Re: [patch] Fix checksum verification in base backups for zero page headers

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] Fix checksum verification in base backups for zero page headers
Date: 2020-09-22 14:26:31
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Am Mittwoch, den 02.09.2020, 16:50 +0300 schrieb Anastasia Lubennikova:
> On 01.09.2020 13:22, Michael Banck wrote:
> > as a continuation of [1], I've split-off the zero page header case from
> > the last patch, as this one seems less contentious.
> > [1]
> I've looked through the previous discussion. As far as I got it, most of
> the controversy was about online checksums improvements.
> The warning about pd_upper inconsistency that you've added is a good
> addition. The patch is a bit messy, though, because a huge code block
> was shifted.
> Will it be different, if you just leave
> "if (!PageIsNew(page) && PageGetLSN(page) < startptr)"
> block as it was, and add
> "else if (PageIsNew(page) && !PageIsZero(page))" ?

Thanks, that does indeed look better as a patch and I think it's fine
as-is for the code as well, I've attached a v2.

> While on it, I also have a few questions about the code around:

All fair points, but I think those should be handled in another patch,
if any.

> 1) Maybe we can use other header sanity checks from PageIsVerified() as
> well? Or even the function itself.

Yeah, I'll keep that in mind.

> 2) > /* Reset loop to validate the block again */
> How many times do we try to reread the block? Is one reread enough?

Not sure whether more rereads would help, but I think the general
direction was to replace this with something better anyway I think (see

> Speaking of which, 'reread_cnt' name looks confusing to me. I would
> expect that this variable contains a number of attempts, not the number
> of bytes read.

Well, there are other "cnt" variables that keep the number of bytes read
in that file, so it does not seem to be out of place for me. A comment
might not hurt though.

On second glance, it should maybe also be of size_t and not int type, as
is the other cnt variable?

> If everyone agrees, that for basebackup purpose it's enough to rely on a
> single reread, I'm ok with it.
> Another approach is to read the page directly from shared buffers to
> ensure that the page is fine. This way is more complicated, but should
> be almost bullet-proof. Using it we can also check pages with lsn >=
> startptr.

Right, that's what Andres also advocated for initially I think, and what
should be done going forward.

> 3) Judging by warning messages, we count checksum failures per file, not
> per page, and don't report after a fifth failure. Why so? Is it a
> common case that so many pages of one file are corrupted?

I think this was a request during the original review, on the basis that
if we have more than a few checksum errors, it's likely there is
something fundamentally broken and it does not make sense to spam the
log with potentially millions of broken checksums.



Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen:

Attachment Content-Type Size
0001-Fix-checksum-verification-in-base-backups-for-zero-p.V2.patch text/x-patch 6.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2020-09-22 14:30:19 Re: [patch] Fix checksum verification in base backups for zero page headers
Previous Message Tom Lane 2020-09-22 14:18:40 Re: pgindent vs dtrace on macos