Re: Online verification of checksums

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2020-11-23 21:17:54
Message-ID: 97cbb4a6-ee0a-4cad-6b65-84e06d14dfe9@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.11.2020 18:35, Stephen Frost wrote:
> Greetings,
>
> * Anastasia Lubennikova (a(dot)lubennikova(at)postgrespro(dot)ru) wrote:
>> On 21.11.2020 04:30, Michael Paquier wrote:
>>> The only method I can think as being really
>>> reliable is based on two facts:
>>> - Do a check only on pd_checksums, as that validates the full contents
>>> of the page.
>>> - When doing a retry, make sure that there is no concurrent I/O
>>> activity in the shared buffers. This requires an API we don't have
>>> yet.
>> It seems reasonable to me to rely on checksums only.
>>
>> As for retry, I think that API for concurrent I/O will be complicated.
>> Instead, we can introduce a function to read the page directly from shared
>> buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof
>> solution to me. Do you see any possible problems with it?
> We might end up reading pages back in that have been evicted, for one
> thing, which doesn't seem great,
TBH, I think it is highly unlikely that the page that was just updated
will be evicted.
> and this also seems likely to be
> awkward for cases which aren't using the replication protocol, unless
> every process maintains a connection to PG the entire time, which also
> doesn't seem great.
Have I missed something? Now pg_basebackup has only one process + one
child process for streaming. Anyway, I totally agree with your argument.
The need to maintain connection(s) to PG is the most unpleasant part of
the proposed approach.

> Also- what is the point of reading the page from shared buffers
> anyway..?
Well... Reading a page from shared buffers is a reliable way to get a
correct page from postgres under any concurrent load. So it just seems
natural to me.
> All we need to do is prove that the page will be rewritten
> during WAL replay.
Yes and this is a tricky part. Until you have explained it in your
latest message, I wasn't sure how we can distinct concurrent update from
a page header corruption. Now I agree that if page LSN updated and
increased between rereads, it is safe enough to conclude that we have
some concurrent load.

> If we can prove that, we don't actually care what
> the contents of the page are. We certainly can't calculate the
> checksum on a page we plucked out of shared buffers since we only
> calculate the checksum when we go to write the page out.
Good point. I was thinking that we can recalculate checksum. Or even
save a page without it, as we have checked LSN and know for sure that it
will be rewritten by WAL replay.

To sum up, I agree with your proposal to reread the page and rely on
ascending LSNs. Can you submit a patch?
You can write it on top of the latest attachment in this thread:
v8-master-0001-Fix-page-verifications-in-base-backups.patch
<https://www.postgresql.org/message-id/attachment/115403/v8-master-0001-Fix-page-verifications-in-base-backups.patch>
from this message
https://www.postgresql.org/message-id/20201030023028.GC1693@paquier.xyz

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-23 21:20:29 Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
Previous Message Tom Lane 2020-11-23 21:14:18 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)