Re: Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-18 07:52:28
Message-ID: 1552895548.9697.35.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost:
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
> > Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost:
> > > * Michael Paquier (michael(at)paquier(dot)xyz) wrote:
> > > > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote:
> > > > > To be clear, I agree completely that we don't want to be reporting false
> > > > > positives or "this might mean corruption!" to users running the tool,
> > > > > but I haven't seen a good explaination of why this needs to involve the
> > > > > server to avoid that happening. If someone would like to point that out
> > > > > to me, I'd be happy to go read about it and try to understand.
> > > >
> > > > The mentions on this thread that the server has all the facility in
> > > > place to properly lock a buffer and make sure that a partial read
> > > > *never* happens and that we *never* have any kind of false positives,
> > >
> > > Uh, we are, of course, going to have partial reads- we just need to
> > > handle them appropriately, and that's not hard to do in a way that we
> > > never have false positives.
> >
> > I think the current patch (V13 from https://www.postgresql.org/message-i
> > d/1552045881(dot)4947(dot)43(dot)camel(at)credativ(dot)de) does that, modulo possible bugs.
>
> I think the question here is- do you ever see false positives with this
> latest version..? If you are, then that's an issue and we should
> discuss and try to figure out what's happening. If you aren't seeing
> false positives, then it seems like we're done here, right?

What do you mean with false positives here? I've never seen a bogus
checksum failure, i.e. pg_checksums claiming some checksum is wrong
cause it only read half of a block or a torn page.

I do see sporadic partial reads and they get treated by the re-check
logic and (if that is not enough) get tallied up as a skipped block in
the end. Is that a false positive in your book?

[...]

> > I have now rebased that patch on top of the pg_verify_checksums ->
> > pg_checksums renaming, see attached.
>
> Thanks for that. Reading through the code though, I don't entirely
> understand why we're making things complicated for ourselves by trying
> to seek and re-read the entire block, specifically this:

[...]

> I would think that we could just do:
>
> insert_location = 0;
> r = read(BLCKSIZE - insert_location);
> if (r < 0) error();
> if (r == 0) EOF detected, move to next
> if (r < (BLCKSIZE - insert_location)) {
> insert_location += r;
> continue;
> }
>
> At this point, we should have a full block, do our checks...

Well, we need to read() into some buffer which you have ommitted.

So if we had a short read, and then read the rest of the block via
(BLCKSIZE - insert_location) wouldn't we have to read that in a second
buffer and then join the two in order to compute the checksum? That
does not sounds simpler to me than just re-reading the block entirely.

> Have you seen cases where the kernel will actually return a partial read
> for something that isn't at the end of the file, and where you could
> actually lseek past that point and read the next block? I'd be really
> curious to see that if you can reproduce it... I've definitely seen
> empty pages come back with a claim that the full amount was read, but
> that's a very different thing.

Well, I've seen partial reads and I have seen very rarely that it will
continue to read another block afterwards. If the relation is being
extended while we check it, it sounds plausible that another block could
be written before we get to read EOF on the next read() after a partial
read() so that does not sounds like a bug to me either.

I might be misunderstanding your question though?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-03-18 08:00:24 Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead
Previous Message Chris Travers 2019-03-18 07:45:44 Re: Data-only pg_rewind, take 2