Re: Online checksums verification in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-10-27 07:07:46
Message-ID: 20201027070746.GD28445@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 23, 2020 at 06:06:30PM +0900, Michael Paquier wrote:
> On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote:
>> Mmm, is it really an improvement to report warnings during this
>> function execution? Note also that PageIsVerified as-is won't report
>> a warning if a page is found as PageIsNew() but isn't actually all
>> zero, while still being reported as corrupted by the SRF.
>
> Yep, joining the point of above to just have no WARNINGs at all.

Now that we have d401c57, I got to consider more this one, and opted
for not generating a WARNING for now. Hence, PageisVerifiedExtended()
is disabled regarding that, but we still report a checksum failure in
it.

I have spent some time reviewing the tests, and as I felt this was
bulky. In the reworked version attached, I have reduced the number of
tests by half, without reducing the coverage, mainly:
- Removed all the stderr and the return code tests, as we always
expected the commands to succeed, and safe_psql() can do the job
already.
- Merged of the queries using pg_relation_check_pages into a single
routine, with the expected output (set of broken pages returned in the
SRF) in the arguments.
- Added some prefixes to the tests, to generate unique test names.
That makes debug easier.
- The query on pg_stat_database is run once at the beginning, once at
the end with the number of checksum failures correctly updated.
- Added comments to document all the routines, and renamed some of
them mostly for consistency.
- Skipped system relations from the scan of pg_class, making the test
more costly for nothing.
- I ran some tests on Windows, just-in-case.

I have also added a SearchSysCacheExists1() to double-check if the
relation is missing before opening it, added a
CHECK_FOR_INTERRUPTS() within the main check loop (where the error
context is really helpful), indented the code, bumped the catalogs
(mostly a self-reminder), etc.

So, what do you think?
--
Michael

Attachment Content-Type Size
online-checksum-v19.patch text/x-diff 28.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Samokhvalov 2020-10-27 07:12:00 Re: Add important info about ANALYZE after create Functional Index
Previous Message Craig Ringer 2020-10-27 07:07:22 Re: Internal key management system