Re: Online verification of checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2018-11-22 17:39:28
Message-ID: 20181122173928.GB3415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 11/22/18 2:12 AM, Stephen Frost wrote:
> >* Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
> >>On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> >>>The "check if page was modified since checkpoint" does not look useful when
> >>>offline. Maybe it lacks a comment to say that this cannot (should not ?)
> >>>happen when offline, but even then I would not like it to be true: ISTM that
> >>>no page should be allowed to be skipped on the checkpoint condition when
> >>>offline, but it is probably ok to skip with the new page test, which make me
> >>>still think that they should be counted and reported separately, or at least
> >>>the checkpoint skip test should not be run when offline.
> >>
> >>What is the rationale to not skip on the checkpoint condition when the
> >>instance is offline? If it was shutdown cleanly, this should not
> >>happen, if the instance crashed, those would be spurious errors that
> >>would get repaired on recovery.
> >>
> >>I have not changed that for now.
> >
> >Agreed- this is an important check even in offline mode.
>
> Yeah. I suppose we could detect if the shutdown was clean (like pg_rewind
> does), and then skip the check. Or perhaps we should still do the check
> (without a retry), and report it as issue when we find a page with LSN newer
> than the last checkpoint.

I agree that it'd be nice to report an issue if it's a clean shutdown
but there's an LSN newer than the last checkpoint, though I suspect that
would be more useful in debugging and such and not so useful for users.

> In any case, the check is pretty cheap (comparing two 64-bit values), and I
> don't see how skipping it would optimize anything. It would make the code a
> tad simpler, but we still need the check for the online mode.

Yeah, I'd just keep the check.

> A minor detail is that the reads/writes should be atomic at the sector
> level, which used to be 512B, so it's not just about pages torn in 4kB/4kB
> manner, but possibly an arbitrary mix of 512B chunks from old and new
> version.

Sure.

> This also explains why we don't need any delay - the reread happens after
> the write must have already written the page header, so the new LSN must be
> already visible.

Agreed.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-11-22 18:03:21 Re: Tid scan improvements
Previous Message Tomas Vondra 2018-11-22 17:29:04 Re: Online verification of checksums