Re: Online verification of checksums

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Online verification of checksums
Date: 2020-10-30 02:30:28
Message-ID: 20201030023028.GC1693@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 22, 2020 at 10:41:53AM +0900, Michael Paquier wrote:
> We cannot trust the fields fields of the page header because these may
> have been messed up with some random corruption, so what really
> matters is that the checksums don't match, and that we can just rely
> on that. The zero-only case of a page is different because these
> don't have a checksum set, so I would finish with something like the
> attached to make the detection more robust. This does not make the
> detection perfect as there is no locking insurance (we really need to
> do that but v13 has been released already), but with a sufficient
> number of retries this can make things much more reliable than what's
> present.
>
> Are there any comments? Anybody?

So, hearing nothing, attached is a set of patches that I would like to
apply to 11~ to address the set of issues of this thread. This comes
with two parts:
- Some refactoring of PageIsVerified(), similar to d401c57 on HEAD
except that this keeps ABI compatibility.
- The actual patch, with tweaks for each stable branch.

Playing with dd and generating random pages, this detects random
corruptions, making use of a wait/retry loop if a failure is detected.
As mentioned upthread, this is a double-edged sword, increasing the
number of retries reduces the changes of false positives, at the cost
of making regression tests longer. This stuff uses up to 5 retries
with 100ms of sleep for each page. (I am aware of the fact that the
commit message of the main patch is not written yet).
--
Michael

Attachment Content-Type Size
v8-master-0001-Fix-page-verifications-in-base-backups.patch text/x-diff 8.4 KB
v8-13-0001-Extend-PageIsVerified-to-handle-more-custom-optio.patch text/x-diff 6.8 KB
v8-13-0002-Fix-page-verification-in-base-backups.patch text/x-diff 9.3 KB
v8-12-0001-Extend-PageIsVerified-to-handle-more-custom-optio.patch text/x-diff 6.0 KB
v8-12-0002-Fix-page-verification-in-base-backups.patch text/x-diff 9.3 KB
v8-11-0001-Extend-PageIsVerified-to-handle-more-custom-optio.patch text/x-diff 5.1 KB
v8-11-0002-Fix-page-verification-in-base-backups.patch text/x-diff 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-10-30 02:33:42 Re: [patch] Fix checksum verification in base backups for zero page headers
Previous Message Julien Rouhaud 2020-10-30 02:01:08 Re: Online checksums verification in the backend