Re: Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-03 10:51:48
Message-ID: 1551610308.4947.34.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Am Samstag, den 02.03.2019, 11:08 -0500 schrieb Stephen Frost:h
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
> > Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> > > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> > > <michael(dot)banck(at)credativ(dot)de> wrote:
> > > > I have added a retry for this as well now, without a pg_sleep() as well.
> > > > This catches around 80% of the half-reads, but a few slip through. At
> > > > that point we bail out with exit(1), and the user can try again, which I
> > > > think is fine?
> > >
> > > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > > robust at all.
> >
> > The chance that pg_verify_checksums hits a torn page (at least in my
> > tests, see below) is already pretty low, a couple of times per 1000
> > runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> > on. Otherwise, we now just issue a warning and skip the file (or so was
> > the idea, see below), do you think that is not acceptable?
> >
> > I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> > pg_verify_checksums in tight loops) with the current patch version, and
> > I am seeing short reads very, very rarely (maybe every 1000th run) with
> > a warning like:
> >
> > > 1174
> > > pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
> > > pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
> > > Files skipped: 2
> >
> > The 1174 is the sequence number, the first 1173 runs of
> > pg_verify_checksums only skipped blocks.
> >
> > However, the fact it shows two warnings for the same file means there is
> > something wrong here. It was continueing to the next block while I think
> > it should just skip to the next file on read failures. So I have changed
> > that now, new patch attached.
>
> I'm confused- if previously it was continueing to the next block instead
> of doing the re-read on the same block, why don't we just change it to
> do the re-read on the same block properly and see if that fixes the
> retry, instead of just giving up and skipping..?

It was re-reading the block and continueing to read the file after it
got a short read even on re-read.

> I'm not necessairly against skipping to the next file, to be clear,
> but I think I'd be happier if we kept reading the file until we
> actually get EOF.

So if we read half a block twice we should seek() to the next block and
continue till EOF, ok. I think in most cases those pages will be new
anyway and there will be no checksum check, but it sounds like a cleaner
approach. I've seen one or two examples where we did successfully verify
the checksum of a page after a half-read, so it might be worth it.

The alternative would be to just bail out early and skip the file on the
first short read and (possibly) log a skipped file.

I still think that an external checksum verification tool has some
merit, given that basebackup does it and the current offline requirement
is really not useful in practise.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-03 13:34:42 Re: NOT IN subquery optimization
Previous Message Anastasia Lubennikova 2019-03-03 09:46:15 Re: [PATCH] kNN for btree