|From:||Stephen Frost <sfrost(at)snowman(dot)net>|
|To:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|Cc:||Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Online verification of checksums|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 09/18/2018 12:01 AM, Stephen Frost wrote:
> > * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> >> On 09/17/2018 07:35 PM, Stephen Frost wrote:
> >> But the trick is that if the read sees the effect of the write somewhere
> >> in the middle of the page, the next read is guaranteed to see all the
> >> preceding new data.
> > If that's guaranteed then we can just check the LSN and be done.
> What do you mean by "check the LSN"? Compare it to LSN from the first
> read? You don't know if the first read already saw the new LSN or not
> (see the next example).
Hmm, ok, I can see your point there. I've been going back and forth
between checking against what the prior LSN was on the page and checking
it against an independent source (like the last checkpoint's LSN), but..
> Comparing the page LSN to the last checkpoint LSN solves this, because
> if the LSN is older than the checkpoint LSN, that write must have been
> completed by now, and so we're not in danger of seeing only incomplete
> effects of it. And newer write will update the LSN.
Yeah, that makes sense- we need to be looking at something which only
gets updated once the write has actually completed, and the last
checkpoint's LSN gives us that guarantee.
> > The problem that we aren't solving for is if, somehow, we do a read(8K)
> > and get the first half/second half mixup and then on a subsequent
> > read(8K) we see that *again*, implying that somehow the kernel's copy
> > has the latter-half of the page updated consistently but not the first
> > half. That's a problem that I haven't got a solution to today. I'd
> > love to have a guarantee that it's not possible- we've certainly never
> > seen it but it's been a concern and I thought Michael was suggesting
> > he'd seen that, but it sounds like there wasn't a check on the LSN in
> > the first read, in which case it could have just been a 'regular' torn
> > page case.
> Well, yeah. If that would be possible, we'd be in serious trouble. I've
> done quite a bit of experimentation with concurrent reads and writes and
> I have not observed such behavior. Of course, that's hardly a proof it
> can't happen, and it wouldn't be the first surprise with respect to
> kernel I/O this year ...
I'm glad to hear that you've done a lot of experimentation in this area
and haven't seen such strange behavior happen- we've got quite a few
people running pgBackRest with checksum-checking and haven't seen it
either, but it's always been a bit of a concern.
> You're right it's not about the fsync, sorry for the confusion. My point
> is that using the checkpoint LSN gives us a guarantee that write is no
> longer in progress, and so we can't see a page torn because of it. And
> if we see a partial write due to a new write, it's guaranteed to update
> the page LSN (and we'll notice it).
Right, no worries about the confusion, I hadn't been fully thinking
through the LSN bit either and that what we really need is some external
confirmation of a write having *completed* (not just started) and that
makes a definite difference.
> > Right, I'm in agreement with doing that and it's what is done in
> > pgbasebackup and pgBackRest.
> OK. All I'm saying is pg_verify_checksums should probably do the same
> thing, i.e. grab checkpoint LSN and roll with that.
|Next Message||Thomas Munro||2018-09-18 01:18:56||Re: [PATCH] Fix for infinite signal loop in parallel scan|
|Previous Message||Tomas Vondra||2018-09-18 00:34:35||Re: Online verification of checksums|