Re: Online verification of checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2020-11-15 15:37:36
Message-ID: CABUevEwspcASScjnpK-Eb3fuP3fpxGFAzx3EhDE4nfG=xAsUUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 10, 2020 at 5:44 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote:
> > I was referring to the patch I sent on this thread that fixes the
> > detection of a corruption for the zero-only case and where pd_lsn
> > and/or pg_upper are trashed by a corruption of the page header. Both
> > cases allow a base backup to complete on HEAD, while sending pages
> > that could be corrupted, which is wrong. Once you make the page
> > verification rely only on pd_checksum, as the patch does because the
> > checksum is the only source of truth in the page header, corrupted
> > pages are correctly detected, causing pg_basebackup to complain as it
> > should. However, it has also the risk to cause pg_basebackup to fail
> > *and* to report as broken pages that are in the process of being
> > written, depending on how slow a disk is able to finish a 8kB write.
> > That's a different kind of wrongness, and users have two more reasons
> > to be pissed. Note that if a page is found as torn we have a
> > consistent page header, meaning that on HEAD the PageIsNew() and
> > PageGetLSN() would pass, but the checksum verification would fail as
> > the contents at the end of the page does not match the checksum.
>
> Magnus, as the original committer of 4eb77d5, do you have an opinion
> to share?
>

I admit that I at some point lost track of the overlapping threads around
this, and just figured there was enough different checksum-involved-people
on those threads to handle it :) Meaning the short answer is "no, I don't
really have one at this point".

Slightly longer comment is that it does seem reasonable, but I have not
read in on all the different issues discussed over the whole thread, so
take that as a weak-certainty comment.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2020-11-15 15:51:34 Re: Online checksums patch - once again
Previous Message Heikki Linnakangas 2020-11-15 15:10:53 Re: Refactor pg_rewind code and make it work against a standby