Re: Online verification of checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-18 05:43:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 3/2/19 12:03 AM, Robert Haas wrote:
> > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> > <michael(dot)banck(at)credativ(dot)de> wrote:
> >> I have added a retry for this as well now, without a pg_sleep() as well.
> >> This catches around 80% of the half-reads, but a few slip through. At
> >> that point we bail out with exit(1), and the user can try again, which I
> >> think is fine?
> >
> > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > robust at all.
> FWIW I don't think this qualifies as torn page - i.e. it's not a full
> read with a mix of old and new data. This is partial write, most likely
> because we read the blocks one by one, and when we hit the last page
> while the table is being extended, we may only see the fist 4kB. And if
> we retry very fast, we may still see only the first 4kB.

I really still am not following why this is such an issue- we do a read,
get back 4KB, do another read, check if it's zero, and if so then we
should be able to conclude that we're at the end of the file, no? If
we're at the end of the file and we don't have a final complete block to
run a checksum check on then it seems clear to me that the file was
being extended and it's ok to skip that block. We could also stat the
file and keep track of where we are, to detect such an extension of the
file happening, if we wanted an additional cross-check, couldn't we? If
we do a read and get 4KB back and then do another and get 4KB back, then
we just treat it like we would an 8KB block. Really, as long as a
subsequent read is returning bytes then we keep going, and if it returns
zero then it's EOF. I could maybe see a "one final read" option, but I
don't think it makes sense to have some kind of time-based delay around
this where we keep trying to read.

All of this about hacking up a way to connect to PG and lock pages in
shared buffers so that we can perform a checksum check seems really
rather ridiculous for either the extension case or the regular mid-file
torn-page case.

To be clear, I agree completely that we don't want to be reporting false
positives or "this might mean corruption!" to users running the tool,
but I haven't seen a good explaination of why this needs to involve the
server to avoid that happening. If someone would like to point that out
to me, I'd be happy to go read about it and try to understand.



In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-03-18 05:47:06 Re: Online verification of checksums
Previous Message Kyotaro HORIGUCHI 2019-03-18 05:13:34 Re: Unaccent extension python script Issue in Windows