Re: Online verification of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-02-16 12:22:58
Message-ID: 1550319778.12689.8.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Am Mittwoch, den 06.02.2019, 11:39 -0500 schrieb Stephen Frost:
> * Michael Banck (michael(dot)banck(at)credativ(dot)de) wrote:
> > Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:
> > > On 2/5/19 8:01 AM, Andres Freund wrote:
> > > > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > > > > > cannot be read on retry: the command should count this as a kind of bad
> > > > > > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > > > > > having checked everything else and generated a report. I do not think that
> > > > > > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > > > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > > > > > happen when online, although I could not trigger one despite heavy testing,
> > > > > > > > so I'm possibly mistaken.
> > > > > > >
> > > > > > > This seems like a defensible judgement call either way.
> > > > > >
> > > > > > Right now we have a few tests that explicitly check that
> > > > > > pg_verify_checksums fail on broken data ("foo" in the file). Those
> > > > > > would then just get skipped AFAICT, which I think is the worse behaviour
> > > > > > , but if everybody thinks that should be the way to go, we can
> > > > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > My point is that it should fail as it does, only not immediately (early
> > > > > exit), but after having checked everything else. This mean avoiding calling
> > > > > "exit(1)" here and there (lseek, fopen...), but taking note that something
> > > > > bad happened, and call exit only in the end.
> > > >
> > > > I can see both as being valuable (one gives you a more complete picture,
> > > > the other a quicker answer in scripts). For me that's the point where
> > > > it's the prerogative of the author to make that choice.
>
> ... unless people here object or prefer other options, and then it's up
> to discussion and hopefully some consensus comes out of it.
>
> Also, I have to say that I really don't think the 'quicker answer'
> argument holds any weight, making me question if that's a valid
> use-case. If there *isn't* an issue, which we would likely all agree is
> the case the vast majority of the time that this is going to be run,
> then it's going to take quite a while and anyone calling it should
> expect and be prepared for that. In the extremely rare cases, what does
> exiting early actually do for us?
>
> > Personally, I would prefer to keep it as simple as possible for now and
> > get this patch committed; in my opinion the behaviour is already like
> > this (early exit on corrupt files) so I don't think the online
> > verification patch should change this.
>
> I'm also in the camp of "would rather it not exit immediately, so the
> extent of the issue is clear".
>
> > If we see complaints about this, then I'd be happy to change it
> > afterwards.
>
> I really don't think this is something we should change later on in a
> future release.. If the consensus is that there's really two different
> but valid use-cases then we should make it configurable, but I'm not
> convinced there is.

OK, fair enough.

> > > Why not make this configurable, using a command-line option?
> >
> > I like this even less - this tool is about verifying checksums, so
> > adding options on what to do when it encounters broken pages looks out-
> > of-scope to me. Unless we want to say it should generally abort on the
> > first issue (i.e. on wrong checksums as well).
>
> I definitely disagree that it's somehow 'out of scope' for this tool to
> skip broken pages, when we can tell that they're broken.

I didn't mean that it's out-of-scope for pg_verify_checksums, I meant it
is out-of-scope for this patch, which adds online checking.

> There is a question here about how to handle a short read since that
> can happen under normal conditions if we're unlucky. The same is also
> true for files disappearing entirely.
>
> So, let's talk/think through a few cases:
>
> A file with just 'foo\n' in it- could that be a page starting with
> an LSN around 666F6F0A that we somehow only read the first few bytes of?
> If not, why not? I could possibly see an argument that we expect to
> always get at least 512 bytes in a read, or 4K, but it seems like we
> could possibly run into edge cases on odd filesystems or such. In the
> end, I'm leaning towards categorizing different things, well,
> differently- a short read would be reported as a NOTICE or equivilant,
> perhaps, meaning that the test case needs to do something more than just
> have a file with 'foo' in it, but that is likely a good things anyway-
> the test cases would be better if they were closer to real world. Other
> read failures would be reported in a more serious category assuming they
> are "this really shouldn't happen" cases. A file disappearing isn't a
> "can't happen" case, and might be reported at the same 'NOTICE' level
> (or maybe with a 'verbose' ption).

In the context of this patch, we should also discern whether a
particular case is merely a notice (or warning) on an offline cluster, I
guess you think it should be?

So I've changed it such that a short read emits a "warning" message,
increments a new skippedfiles (as it is not just a skipped block)
variable and reports its number at the end - should it then exit with >
0 even if there were no wrong checksums?

> A file that's 8k in size and has a checksum but it's not right seems
> pretty clear to me. Might as well include a count of pages which have a
> valid checksum, I would think, though perhaps only in a 'verbose' mode
> would that get reported.

What's the use for that? It already reports the number of scanned blocks
at the end, so that number is pretty easy to figure out from it and the
number of bad checksums.

> A completely zero'd page could also be reported at a NOTICE level or
> with a count, or perhaps only with verbose.

It is counted as a skipped block right now (well, every block that
qualifes for PageIsNew() is), but skipped blocks are not mentioned right
now. I guess the rationale is that it might lead to excessive screen
output (but then, verbose originally logged /every/ block), but you'd
have to check with the original authors.

So I have now changed behaviour so that short writes count as skipped
files and pg_verify_checksums no longer bails out on them. When this
occors a warning is written to stderr and their overall count is also
reported at the end. However, unless there are other blocks with bad
checksums, the exit status is kept at zero.

New patch attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachment Content-Type Size
online-verification-of-checksums_V10.patch text/x-patch 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-02-16 14:35:46 Re: [Patch][WiP] Tweaked LRU for shared buffers
Previous Message Sergei Kornilov 2019-02-16 11:50:35 Re: allow online change primary_conninfo