Re: Online verification of checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Online verification of checksums
Date: 2019-03-18 21:50:36
Message-ID: CAOuzzgrZyz_vgRjd8GrLen39B7KWXY-qTyobDE5nP2mSTtGOgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Tue, Mar 19, 2019 at 04:15 Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Am Montag, den 18.03.2019, 16:11 +0800 schrieb Stephen Frost:
> > On Mon, Mar 18, 2019 at 15:52 Michael Banck <michael(dot)banck(at)credativ(dot)de>
> wrote:
> > > Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost:
> > > > Thanks for that. Reading through the code though, I don't entirely
> > > > understand why we're making things complicated for ourselves by
> trying
> > > > to seek and re-read the entire block, specifically this:
> > >
> > > [...]
> > >
> > > > I would think that we could just do:
> > > >
> > > > insert_location = 0;
> > > > r = read(BLCKSIZE - insert_location);
> > > > if (r < 0) error();
> > > > if (r == 0) EOF detected, move to next
> > > > if (r < (BLCKSIZE - insert_location)) {
> > > > insert_location += r;
> > > > continue;
> > > > }
> > > >
> > > > At this point, we should have a full block, do our checks...
> > >
> > > Well, we need to read() into some buffer which you have ommitted.
> >
> > Surely there’s a buffer the read in the existing code is passing in,
> > you just need to offset by the current pointer, sorry for not being
> > clear.
> >
> > In other words the read would look more like:
> >
> > read(fd,buf + insert_ptr, BUFSZ - insert_ptr)
> >
> > And then you have to reset insert_ptr once you have a full block.
>
> Ok, thanks for clearing that up.
>
> I've tried to do that now in the attached, does that suit you?

Yes, that’s what I was thinking. I’m honestly not entirely convinced that
the lseek() efforts still need to be put in- I would have thought it’d be
fine to simply check the LSN on a checksum failure and mark it as skipped
if the LSN is past the current checkpoint. That seems like it would make
things much simpler, but I’m also not against keeping that logic now that
it’s in, provided it doesn’t cause issues

> Yes, the question was more like this: have you ever seen a read return
> > a partial result when you know you’re in the middle somewhere of an
> > existing file and the length of the file hasn’t been changed by
> > something else..?
>
> I don't think I've seen that, but that wouldn't turn up in regular
> testing anyway I guess but only in pathological cases? I guess we are
> probably dealing with this in the current version of the patch, but I
> can't say for certain as it sounds pretty difficult to test.

Yeah, a lot of things in this area are unfortunately difficult to test.
I’m glad to hear that it doesn’t sound like you’ve seen it though.

I have also added a paragraph to the documentation about possilby
> skipping new or recently updated pages:
>
> + If the cluster is online, pages that have been (re-)written since the
> last
> + checkpoint will not count as checksum failures if they cannot be read
> or
> + verified correctly.

I would flip this around:

——-
In an online cluster, pages are being concurrently written to the files
while the check is being run, leading to possible torn pages or partial
reads. When the tool detects a concurrently written page, indicated by the
page’s LSN being beyond the checkpoint the tool started at, that page will
be reported as skipped. Note that in a crash scenario, any pages written
since the last checkpoint will be replayed from the WAL.
——-

Now here’s the $64 question- have you tested this latest version under
load..? If not, could you? And when you do, can you report back what the
results are? Do you still see any actual checksum failures? Do the number
of skipped pages seem reasonable in your tests or is there a concern there?

If you still see actual checksum failures which aren’t because the LSN is
higher than the checkpoint, or because of a short read, then we need to
investigate further but hopefully that isn’t happening now. I think a lot
of the concerns raised on this thread about wanting to avoid false
positives are because the torn page (with higher LSN than current
checkpoint) and short read cases were previously reported as failures when
they really are expected. Let’s test this as much as we can and make sure
we aren’t seeing false positives anymore.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-18 22:02:23 Re: partitioned tables referenced by FKs
Previous Message Michael Banck 2019-03-18 21:33:02 Re: Progress reporting for pg_verify_checksums