Re: Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Online verification of checksums
Date: 2019-03-28 20:09:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Am Donnerstag, den 28.03.2019, 18:19 +0100 schrieb Tomas Vondra:
> On Thu, Mar 28, 2019 at 05:08:33PM +0100, Michael Banck wrote:
> > I also fixed the two issues Andres reported, namely a zeroed-out
> > pageheader and a random LSN. The first is caught be checking for an all-
> > zero-page in the way PageIsVerified() does. The second is caught by
> > comparing the upper 32 bits of the LSN as well and demanding that they
> > are equal. If the LSN is corrupted, the upper 32 bits should be wildly
> > different to the current checkpoint LSN.
> >
> > Well, at least that is a stab at a fix; there is a window where the
> > upper 32 bits could legitimately be different. In order to make that as
> > small as possible, I update the checkpoint LSN every once in a while.

I decided it makes more sense to just re-read the checkpoint LSN from
the control file when we encounter a wrong checksum on re-read of a page
as that is when it counts, instead of doing it only every once in a

> Doesn't that mean we'll report a false positive?

A false positive would be pg_checksums claiming a block has a wrong
checksum while in fact it does not (after it is correctly written out
and synced to disk), right?

If pg_checksums reads a current first part and a stale second part twice
in a row (we re-read the block), then the LSN of the first part would
presumably(?) be higher than the latest checkpoint LSN. If there was a
wraparound in the lower part of the LSN so that the upper part is now
different to the latest checkpoint LSN, then pg_checksums would report
this as a false positive I believe. 

We could add some additional heuristics like checking the upper part of
the LSN has advanced by at most one but that does not seem to make it
100% certified robust either, does it?

If pg_checksums reads a current second part and a stale first part
twice, then the pageheader LSN would presumably be lower than the
checkpoint LSN and again a false positive would be reported.

At least in my testing I haven't seen the second case and the first
(disregarding the wraparound issue for now) extremely rarely if at all
(usually the torn page is gone on re-read). The first case requiring a
wraparound since the latest checkpointLSN update also seems quite narrow
compared to the issue of random data being written due to corruption. So
I think it is more important to make sure random data won't be a false
negative than this being a false positive.

Maybe we can just issue a warning in online mode that some checksum
failures could be false positives and advise the user to recheck those
files (using the -r switch) again? I have added this in the attached new

+ printf(_("%s ran against an online cluster and found some bad checksums.\n"), progname);
+ printf(_("It could be that those are false positives due concurrently updated blocks,\n"));
+ printf(_("checking the offending files again with the -r option is advised.\n"));

It was not mentioned on this thread, but I want to stress again that you
cannot run the current pg_checksums on a basebackup due to the control
file claiming it is still online. This makes the current program pretty
useless for production setups right now in my opinion as few people have
the luxury of regular maintenance downtimes when pg_checksums could run
and running it against base backups is quite cumbersome.

Maybe we can improve things by checking for the as well
and going ahead (only for --check of course) if it is missing, but that
hasn't been implemented yet.

I agree that the current patch might have some corner-cases where it
does not guarantee 100% accuracy in online mode, but I hope the current
version at least has no more false negatives.


Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen:

Attachment Content-Type Size
online-verification-of-checksums_V18.patch text/x-patch 17.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-03-28 20:11:40 Re: Online verification of checksums
Previous Message Robert Haas 2019-03-28 19:59:00 Re: patch to allow disable of WAL recycling