Quick Links

Re: Online verification of checksums

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Online verification of checksums
Date:	2019-03-03 06:58:26
Message-ID:	alpine.DEB.2.21.1903030743240.8095@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Bonjour Michaël,

>> I gotta say, my conclusion from this debate is that it's simply a
>> mistake to do this without involvement of the server that can use
>> locking to prevent these kind of issues. It seems pretty absurd to me
>> to have hacky workarounds around partial writes of a live server, around
>> truncation, etc, even though the server has ways to deal with that.
>
> I agree with Andres on this one. We are never going to make this stuff
> safe if we don't handle page reads with the proper locks because of torn
> pages. What I think we should do is provide a SQL function which reads a
> page in shared mode, and then checks its checksum if its LSN is older
> than the previous redo point. This discards cases with rather hot
> pages, but if the page is hot enough then the backend re-reading the
> page would just do the same by verifying the page checksum by itself. --
> Michael

My 0.02€ about that, as one of the reviewer of the patch:

I agree that having a server function (extension?) to do a full checksum
verification, possibly bandwidth-controlled, would be a good thing.
However it would have side effects, such as interfering deeply with the
server page cache, which may or may not be desirable.

On the other hand I also see value in an independent system-level external
tool capable of a best effort checksum verification: the current check
that the cluster is offline to prevent pg_verify_checksum from running is
kind of artificial, and when online simply counting
online-database-related checksum issues looks like a reasonable
compromise.

So basically I think that allowing pg_verify_checksum to run on an online
cluster is still a good thing, provided that expected errors are correctly
handled.

--
Fabien.

In response to

Re: Online verification of checksums at 2019-03-02 23:48:30 from Michael Paquier

Responses

Re: Online verification of checksums at 2019-03-04 03:09:41 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2019-03-03 07:09:07	RE: Timeout parameters
Previous Message	Alexander Korotkov	2019-03-03 04:35:52	Re: WIP: BRIN multi-range indexes