Re: Online checksums verification in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-10-19 02:39:18
Message-ID: 20201019023918.GB9612@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 16, 2020 at 09:22:02AM +0800, Julien Rouhaud wrote:
> And Michael just told me that I also missed adding one of the C files
> while splitting the patch into two.

+ if (PageIsNew(page))
+ {
+ /*
+ * Check if the page is really new or if there's corruption that
+ * affected PageIsNew detection. Note that PageIsVerified won't try to
+ * detect checksum corruption in this case, so there's no risk of
+ * duplicated corruption report.
+ */
+ if (PageIsVerified(page, blkno))
+ {
+ /* No corruption. */
+ return true;
+ }
Please note that this part of your patch overlaps with a proposal for
a bug fix related to zero-only pages with the checksum verification of
base backups:
https://www.postgresql.org/message-id/608f3476-0598-2514-2c03-e05c7d2b0cbd@postgrespro.ru

Your patch is trying to adapt itself to the existing logic we have in
PageIsVerified() so as you don't get a duplicated report, as does the
base backup part. Note that I cannot find in the wild any open code
making use of PageIsVerified(), but I'd like to believe that it is
rather handy to use for table AMs at least (?), so if we can avoid any
useless ABI breakage, it may be better to have a new
PageIsVerifiedExtended() able to take additional options, one to
report to pgstat and one to generate this elog(WARNING). And then
this patch could just make use of it?

+ /*
+ * There's corruption, but since this affects PageIsNew, we
+ * can't compute a checksum, so set NoComputedChecksum for the
+ * expected checksum.
+ */
+ *chk_expected = NoComputedChecksum;
+ *chk_found = hdr->pd_checksum;
+ return false;
[...]
+ /*
+ * This can happen if corruption makes the block appears as
+ * PageIsNew() but isn't a new page.
+ */
+ if (chk_expected == NoComputedChecksum)
+ nulls[i++] = true;
+ else
+ values[i++] = UInt16GetDatum(chk_expected);
Somewhat related to the first point, NoComputedChecksum exists
because, as the current patch is shaped, we need to report an existing
checksum to the user even for the zero-only case. PageIsVerified() is
not that flexible so we could change it to report a status depending
on the error faced (checksum, header or zero-only) on top of getting a
checksum. Now, I am not completely sure either that it is worth the
complication to return in the SRF of the check function the expected
checksum. So, wouldn't it be better to just rely on PageIsVerified()
(well it's rather-soon-to-be extended flavor) for the checksum check,
the header sanity check and the zero-only check? My point is to keep
a single entry point for all the page sanity checks, so as base
backups, your patch, and the buffer manager apply the same things.
Base backups got that partially wrong because the base backup code
wants to keep control of the number of failures and the error
reports. Your patch actually wishes to report a failure, but you want
to add more context with the fork name and such. Another option we
could use here is to add an error context so as PageIsVerified()
reports the WARNING, but the SQL function provides more context with
the block number and the relation involved in the check.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-10-19 03:15:24 Re: Internal key management system
Previous Message Michael Paquier 2020-10-19 01:08:58 Re: Possible memory leak in pgcrypto with EVP_MD_CTX