Re: Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Online verification of checksums
Date: 2018-09-26 15:15:27
Message-ID: 1537974927.3800.41.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

Am Mittwoch, den 26.09.2018, 10:54 -0400 schrieb Stephen Frost:
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
> > Am Mittwoch, den 26.09.2018, 13:23 +0200 schrieb Fabien COELHO:
> > > There are debatable changes of behavior:
> > >
> > > if (errno == ENOENT) return / continue...
> > >
> > > For instance, a file disappearing is ok online, but not so if offline. On
> > > the other hand, the probability that a file suddenly disappears while the
> > > server offline looks remote, so reporting such issues does not seem
> > > useful.
> > >
> > > However I'm more wary with other continues/skips added. ISTM that skipping
> > > a block because of a read error, or because it is new, or some other
> > > reasons, is not the same thing, so should be counted & reported
> > > differently?
> >
> > I think that would complicate things further without a lot of benefit.
> >
> > After all, we are interested in checksum failures, not necessarily read
> > failures etc. so exiting on them (and skip checking possibly large parts
> > of PGDATA) looks undesirable to me.
> >
> > So I have done no changes in this part so far, what do others think
> > about this?
>
> I certainly don't see a lot of point in doing much more than what was
> discussed previously for 'new' blocks (counting them as skipped and
> moving on).
>
> An actual read() error (that is, a failure on a read() call such as
> getting back EIO), on the other hand, is something which I'd probably
> report back to the user immediately and then move on, and perhaps
> report again at the end.
>
> Note that a short read isn't an error and falls under the 'new' blocks
> discussion above.

So I've added ENOENT checks when opening or statting files, i.e. EIO
would still be reported.

The current code in master exits on reads which do not return BLCKSZ,
which I've changed to a skip. So that means we now no longer check for
read failures (return code < 0) so I have now added a check for that and
emit an error message and return.

New version 5 attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachment Content-Type Size
online-verification-of-checksums_V5.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-09-26 15:18:21 Re: Online verification of checksums
Previous Message Fabien COELHO 2018-09-26 15:14:02 Re: Online verification of checksums