Re: Online verification of checksums

From: David Steele <david(at)pgmasters(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2021-03-09 17:43:46
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/30/20 6:38 PM, David Steele wrote:
> On 11/30/20 9:27 AM, Stephen Frost wrote:
>> * Michael Paquier (michael(at)paquier(dot)xyz) wrote:
>>> On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote:
>>>> * Magnus Hagander (magnus(at)hagander(dot)net) wrote:
>>>>> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier
>>>>> <michael(at)paquier(dot)xyz> wrote:
>>>>>> But here the checksum is broken, so while the offset is something we
>>>>>> can rely on how do you make sure that the LSN is fine?  A broken
>>>>>> checksum could perfectly mean that the LSN is actually *not* fine if
>>>>>> the page header got corrupted.
>>>> Of course that could be the case, but it gets to be a smaller and
>>>> smaller chance by checking that the LSN read falls within reasonable
>>>> bounds.
>>> FWIW, I find that scary.
>> There's ultimately different levels of 'scary' and the risk here that
>> something is actually wrong following these checks strikes me as being
>> on the same order as random bits being flipped in the page and still
>> getting a valid checksum (which is entirely possible with our current
>> checksum...), or maybe even less.
> I would say a lot less. First you'd need to corrupt one of the eight
> bytes that make up the LSN (pretty likely since corruption will probably
> affect the entire block) and then it would need to be updated to a value
> that falls within the current backup range, a 1 in 16 million chance if
> a terabyte of WAL is generated during the backup. Plus, the corruption
> needs to happen during the backup since we are going to check for that,
> and the corrupted LSN needs to be ascending, and the LSN originally read
> needs to be within the backup range (another 1 in 16 million chance)
> since pages written before the start backup checkpoint should not be torn.
> So as far as I can see there are more likely to be false negatives from
> the checksum itself.
> It would also be easy to add a few rounds of checks, i.e. test if the
> LSN ascends but stays in the backup LSN range N times.
> Honestly, I'm much more worried about corruption zeroing the entire
> page. I don't know how likely that is, but I know none of our proposed
> solutions would catch it.
> Andres, since you brought this issue up originally perhaps you'd like to
> weigh in?

I had another look at this patch and though I think my suggestions above
would improve the patch, I have no objections to going forward as is (if
that is the consensus) since this seems an improvement over what we have

It comes down to whether you prefer false negatives or false positives.
With the LSN checking Stephen and I advocate it is theoretically
possible to have a false negative but the chances of the LSN ascending N
times but staying within the backup LSN range due to corruption seems to
be approaching zero.

I think Michael's method is unlikely to throw false positives, but it
seems at least possible that a block would be hot enough to be appear
torn N times in a row. Torn pages themselves are really easy to reproduce.

If we do go forward with this method I would likely propose the
LSN-based approach as a future improvement.


In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-09 17:44:46 Re: [patch] [doc] Minor variable related cleanup and rewording of plpgsql docs
Previous Message John Naylor 2021-03-09 17:38:57 Re: non-HOT update not looking at FSM for large tuple update