Re: Online verification of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2018-11-22 17:29:04
Message-ID: 53b04b08-e4f4-9d94-e57b-07b38b14705f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/22/18 2:12 AM, Stephen Frost wrote:
> Greetings,
>
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
>> On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
>>> The "check if page was modified since checkpoint" does not look useful when
>>> offline. Maybe it lacks a comment to say that this cannot (should not ?)
>>> happen when offline, but even then I would not like it to be true: ISTM that
>>> no page should be allowed to be skipped on the checkpoint condition when
>>> offline, but it is probably ok to skip with the new page test, which make me
>>> still think that they should be counted and reported separately, or at least
>>> the checkpoint skip test should not be run when offline.
>>
>> What is the rationale to not skip on the checkpoint condition when the
>> instance is offline? If it was shutdown cleanly, this should not
>> happen, if the instance crashed, those would be spurious errors that
>> would get repaired on recovery.
>>
>> I have not changed that for now.
>
> Agreed- this is an important check even in offline mode.
>

Yeah. I suppose we could detect if the shutdown was clean (like
pg_rewind does), and then skip the check. Or perhaps we should still do
the check (without a retry), and report it as issue when we find a page
with LSN newer than the last checkpoint.

In any case, the check is pretty cheap (comparing two 64-bit values),
and I don't see how skipping it would optimize anything. It would make
the code a tad simpler, but we still need the check for the online mode.

>>> When offline, the retry logic does not make much sense, it should complain
>>> directly on the first error? Also, I'm unsure of the read & checksum retry
>>> logic *without any delay*.
>
> The race condition being considered here is where an 8k read somehow
> gets the first 4k, then is scheduled off-cpu, and the full 8k page is
> then written by some other process, and then this process is woken up
> to read the second 4k. I agree that this is unnecessary when the
> database is offline, but it's also pretty cheap. When the database is
> online, it's an extremely unlikely case to hit (just try to reproduce
> it...) but if it does get hit then it's easy enough to recheck by doing
> a reread, which should show that the LSN has been updated in the first
> 4k and we can then know that this page is in the WAL. We have not yet
> seen a case where such a re-read returns an old LSN and an invalid
> checksum; based on discussion with other hackers, that shouldn't be
> possible as every kernel seems to consistently write in-order, meaning
> that the first 4k will be updated before the second, so a single re-read
> should be sufficient.
>

Right.

A minor detail is that the reads/writes should be atomic at the sector
level, which used to be 512B, so it's not just about pages torn in
4kB/4kB manner, but possibly an arbitrary mix of 512B chunks from old
and new version.

This also explains why we don't need any delay - the reread happens
after the write must have already written the page header, so the new
LSN must be already visible.

So no delay is necessary. And if it was, how long should the delay be?
The processes might end up off-CPU for arbitrary amount of time, so
picking a good value would be pretty tricky.

> Remember- this is all in-memory activity also, we aren't talking about
> what might happen on disk here.
>
>> I think the small overhead of retrying in offline mode even if useless
>> is worth avoiding making the code more complicated in order to cater for
>> both modes.
>
> Agreed.
>
>> Initially there was a delay, but this was removed after analysis and
>> requests by several other reviewers.
>
> Agreed, there's no need for or point to having such a delay.
>

Yep.

>>>>> This might suggest some option to tell the command that it should work in
>>>>> online or offline mode, so that it may be stricter in some cases. The
>>>>> default may be one of the option, eg the stricter offline mode, or maybe
>>>>> guessed at startup.
>>>>
>>>> If we believe the operation should be different, the patch removes the
>>>> "is cluster online?" check (as it is no longer necessary), so we could
>>>> just replace the current error message with a global variable with the
>>>> result of that check and use it where needed (if any).
>>>
>>> That could let open the issue of someone starting the check offline, and
>>> then starting the database while it is not finished. Maybe it is not worth
>>> sweating about such a narrow use case.
>>
>> I don't think we need to cater for that, yeah.
>
> Agreed.
>

Yep. I don't think other tools protect against that either. And
pg_rewind does actually modify the cluster state, unlike checksum
verification.

>>> It would also be nice if the test could apply on an active database,
>>> eg with a low-rate pgbench running in parallel to the verification,
>>> but I'm not sure how easy it is to add such a thing.
>>
>> That sounds much more complicated so I have not tackled that yet.
>
> I agree that this would be nice, but I don't want the regression tests
> to become much longer...
>

I have to admit I find this thread rather confusing, because the subject
is "online verification of checksums" yet we're discussing verification
on offline instances.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-11-22 17:39:28 Re: Online verification of checksums
Previous Message Vik Fearing 2018-11-22 17:27:19 Re: New GUC to sample log queries