Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Online verification of checksums
Date: 2018-07-26 11:59:33
Message-ID: 1532606373.3422.5.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

v10 almost added online activation of checksums, but all we've got is
pg_verify_checksums, i.e. offline verification of checkums. 

However, we also got (online) checksum verification during base backups,
and I have ported/adapted David Steele's recheck code to my personal
fork of pg_checksums[1], removed the online check (for verification) and
that seems to work fine.

I've now forward-ported this change to pg_verify_checksums, in order to
make this application useful for online clusters, see attached patch.

I've tested this in a tight loop (while true; do pg_verify_checksums -D
data1 -d > /dev/null || /bin/true; done)[2] while doing "while true; do
createdb pgbench; pgbench -i -s 10 pgbench > /dev/null; dropdb pgbench;
done", which I already used to develop the original code in the fork and
which brought up a few bugs.

I got one checksums verification failure this way, all others were
caught by the recheck (I've introduced a 500ms delay for the first ten
failures) like this:

|pg_verify_checksums: checksum verification failed on first attempt in
|file "data1/base/16837/16850", block 7770: calculated checksum 785 but
|expected 5063
|pg_verify_checksums: block 7770 in file "data1/base/16837/16850"
|verified ok on recheck

However, I am also seeing sporadic (maybe 0.5 times per pgbench run)
failures like this:

|pg_verify_checksums: short read of block 2644 in file
|"data1/base/16637/16650", got only 4096 bytes

This is not strictly a verification failure, should we do anything about
this? In my fork, I am also rechecking on this[3] (and I am happy to
extend the patch that way), but that makes the code and the patch more
complicated and I wanted to check the general opinion on this case
first.

Michael

[1] https://github.com/credativ/pg_checksums/commit/dc052f0d6f1282d3c821
5b0eb28b8e7c4e74f9e5
[2] while patching out the somewhat unhelpful (in regular operation,
anyway) debug message for every successful checksum verification
[3] https://github.com/credativ/pg_checksums/blob/master/pg_checksums.c#
L160
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachment Content-Type Size
online-verification-of-checksums_V1.patch text/x-patch 4.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-07-26 12:11:01 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Previous Message Ashutosh Bapat 2018-07-26 11:39:08 Re: TupleTableSlot abstraction