Re: Online checksums verification in the backend

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-10-27 11:47:19
Message-ID: CAOBaU_bM2tCRAOpQyTSaMLT--=YLKGsyZ-tzWTn25kGbaqrevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 27, 2020 at 3:07 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Oct 23, 2020 at 06:06:30PM +0900, Michael Paquier wrote:
> > On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote:
> >> Mmm, is it really an improvement to report warnings during this
> >> function execution? Note also that PageIsVerified as-is won't report
> >> a warning if a page is found as PageIsNew() but isn't actually all
> >> zero, while still being reported as corrupted by the SRF.
> >
> > Yep, joining the point of above to just have no WARNINGs at all.
>
> Now that we have d401c57, I got to consider more this one, and opted
> for not generating a WARNING for now. Hence, PageisVerifiedExtended()
> is disabled regarding that, but we still report a checksum failure in
> it.

Great, that's also what I had in mind.

> I have spent some time reviewing the tests, and as I felt this was
> bulky. In the reworked version attached, I have reduced the number of
> tests by half, without reducing the coverage, mainly:
> - Removed all the stderr and the return code tests, as we always
> expected the commands to succeed, and safe_psql() can do the job
> already.
> - Merged of the queries using pg_relation_check_pages into a single
> routine, with the expected output (set of broken pages returned in the
> SRF) in the arguments.
> - Added some prefixes to the tests, to generate unique test names.
> That makes debug easier.
> - The query on pg_stat_database is run once at the beginning, once at
> the end with the number of checksum failures correctly updated.
> - Added comments to document all the routines, and renamed some of
> them mostly for consistency.
> - Skipped system relations from the scan of pg_class, making the test
> more costly for nothing.
> - I ran some tests on Windows, just-in-case.
>
> I have also added a SearchSysCacheExists1() to double-check if the
> relation is missing before opening it, added a
> CHECK_FOR_INTERRUPTS() within the main check loop (where the error
> context is really helpful), indented the code, bumped the catalogs
> (mostly a self-reminder), etc.
>
> So, what do you think?

I think it's also worth noting that the IOLock is now acquired just
before getting the buffer state, and released after the read (or after
finding that the buffer is dirty). This is consistent with how it's
done elsewhere, so I'm fine.

Other than that I'm quite happy with the changes you made, thanks a lot!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-10-27 12:12:11 Re: Move catalog toast table and index declarations
Previous Message Heikki Linnakangas 2020-10-27 11:43:48 Re: Yet another fast GiST build