Re: Online verification of checksums

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2020-11-24 17:38:30
Message-ID: 8a4df8ee-0381-26ad-d09a-0367f03914a1@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

On 11/23/20 8:10 PM, Michael Paquier wrote:
> On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote:
>
>> Also- what is the point of reading the page from shared buffers
>> anyway..? All we need to do is prove that the page will be rewritten
>> during WAL replay. If we can prove that, we don't actually care what
>> the contents of the page are. We certainly can't calculate the
>> checksum on a page we plucked out of shared buffers since we only
>> calculate the checksum when we go to write the page out.
>
> A LSN-based check makes the thing tricky. How do you make sure that
> pd_lsn is not itself broken? It could be perfectly possible that a
> random on-disk corruption makes pd_lsn seen as having a correct value,
> still the rest of the page is borked.

We are not just looking at one LSN value. Here are the steps we are
proposing (I'll skip checks for zero pages here):

1) Test the page checksum. If it passes the page is OK.
2) If the checksum does not pass then record the page offset and LSN and
continue.
3) After the file is copied, reopen and reread the file, seeking to
offsets where possible invalid pages were recorded in the first pass.
a) If the page is now valid then it is OK.
b) If the page is not valid but the LSN has increased from the LSN
recorded in the previous pass then it is OK. We can infer this because
the LSN has been updated in a way that is not consistent with storage
corruption.

This is what we are planning for the first round of improving our page
checksum validation. We believe that doing the retry in a second pass
will be faster and more reliable because some time will have passed
since the first read without having to build in a delay for each page error.

A further improvement is to check the ascending LSNs found in 3b against
PostgreSQL to be completely sure they are valid. We are planning this
for our second round of improvements.

Reopening the file for the second pass does require some additional logic:

1) The file may have been deleted by PG since the first pass and in that
case we won't report any page errors.
2) The file may have been truncated by PG since the first pass so we
won't report any errors past the point of truncation.

A malicious attacker could easily trick these checks, but as Stephen
pointed out elsewhere they would likely make the checksums valid which
would escape detection anyway.

We believe that the chances of random storage corruption passing all
these checks is incredibly small, but eventually we'll also check
against the WAL to be completely sure.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2020-11-24 18:07:59 Re: mark/restore failures on unsorted merge joins
Previous Message Daniil Zakhlystov 2020-11-24 17:34:39 Re: libpq compression