Re: Online verification of checksums

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Online verification of checksums
Date: 2018-09-26 15:14:02
Message-ID: alpine.DEB.2.21.1809261703520.22248@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> The patch is missing a documentation update.
>
> I've added that now. I think the only change needed was removing the
> "server needs to be offline" part?

Yes, and also checking that the described behavior correspond to the new
version.

>> There are debatable changes of behavior:
>>
>> if (errno == ENOENT) return / continue...
>>
>> For instance, a file disappearing is ok online, but not so if offline. On
>> the other hand, the probability that a file suddenly disappears while the
>> server offline looks remote, so reporting such issues does not seem
>> useful.
>>
>> However I'm more wary with other continues/skips added. ISTM that skipping
>> a block because of a read error, or because it is new, or some other
>> reasons, is not the same thing, so should be counted & reported
>> differently?
>
> I think that would complicate things further without a lot of benefit.
>
> After all, we are interested in checksum failures, not necessarily read
> failures etc. so exiting on them (and skip checking possibly large parts
> of PGDATA) looks undesirable to me.

Hmmm.

I'm really saying that it is debatable, so here is some fuel to the
debate:

If I run the check command and it cannot do its job, there is a problem
which is as bad as a failing checksum. The only safe assumption on a
cannot-read block is that the checksum is bad... So ISTM that on
on some of the "skipped" errors there should be appropriate report (exit
code, final output) that something is amiss.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-09-26 15:15:27 Re: Online verification of checksums
Previous Message Tom Lane 2018-09-26 15:09:59 Re: Allowing printf("%m") only where it actually works