|From:||Michael Banck <michael(dot)banck(at)credativ(dot)de>|
|To:||Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>|
|Cc:||Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|Subject:||Re: Online verification of checksums|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> On 2019-03-20 03:27:55 +0800, Stephen Frost wrote:
> > On Tue, Mar 19, 2019 at 23:59 Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas:
> > > > > It's torn pages that I am concerned about - the server is writing and
> > > > > we are reading, and we get a mix of old and new content. We have been
> > > > > quite diligent about protecting ourselves from such risks elsewhere,
> > > > > and checksum verification should not be held to any lesser standard.
> > > >
> > > > If we see a checksum failure on an otherwise correctly read block in
> > > > online mode, we retry the block on the theory that we might have read a
> > > > torn page. If the checksum verification still fails, we compare its LSN
> > > > to the LSN of the current checkpoint and don't mind if its newer. This
> > > > way, a torn page should not cause a false positive either way I
> > > > think?.
> > >
> > > False positives, no. But there's plenty potential for false
> > > negatives. In plenty clusters a large fraction of the pages is going to
> > > be touched in most checkpoints.
> > How is it a false negative? The page was in the middle of being
> > written,
> You don't actually know that. It could just be random gunk in the LSN,
> and this type of logic just ignores such failures as long as the random
> gunk is above the system's LSN.
Right, I think this needs to be taken into account. For pg_basebackup,
that'd be an additional check for GetRedoRecPtr() or something
in the below check:
> Well, I don't know what to tell you. But:
> * Only check pages which have not been modified since the
> * start of the base backup. Otherwise, they might have been
> * written only halfway and the checksum would not be valid.
> * However, replaying WAL would reinstate the correct page in
> * this case. We also skip completely new pages, since they
> * don't have a checksum yet.
> if (!PageIsNew(page) && PageGetLSN(page) < startptr)
> doesn't consider plenty scenarios, as pointed out above. It'd be one
> thing if the concerns I point out above were actually commented upon and
> weighed not substantial enough (not that I know how). But...
> > Do you have any example cases where the code in pg_basebackup has resulted
> > in either a false positive or a false negative? Any case which can be
> > shown to result in either?
> CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 1000000) g(i);
> SELECT pg_relation_size('corruptme');
> postgres=# SELECT current_setting('data_directory') || '/' || pg_relation_filepath('corruptme');
> │ ?column? │
> │ /srv/dev/pgdev-dev/base/13390/16384 │
> (1 row)
> dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 conv=notrunc
> Try a basebackup and see how many times it'll detect the corrupt
> data. In the vast majority of cases you're going to see checksum
> failures when reading the data for normal operation, but not when using
> basebackup (or this new tool).
Right, see above.
> At the very very least this would need to do
> a) checks that the page is all zeroes if PageIsNew() (like
> PageIsVerified() does for the backend). That avoids missing cases
> where corruption just zeroed out the header, but not the whole page.
We can't run pg_checksum_page() on those afterwards though as it would
fire an assertion:
|pg_checksum_page: Assertion `!(((PageHeader) (&cpage->phdr))->pd_upper
|== 0)' failed.
But we should count it as a checksum error and generate an appropriate
error message in that case.
> b) Check that pd_lsn is between startlsn and the insertion pointer. That
> avoids accepting just about all random data.
However, for pg_checksums being a stand-alone application it can't just
access the insertion pointer, can it? We could maybe set a threshold
from the last checkpoint after which we consider the pd_lsn bogus. But
what's a good threshold here?
And/or we could port the other sanity checks from PageIsVerified:
| if ((p->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
| p->pd_lower <= p->pd_upper &&
| p->pd_upper <= p->pd_special &&
| p->pd_special <= BLCKSZ &&
| p->pd_special == MAXALIGN(p->pd_special))
| header_sane = true
That should catch large-scale random corruption like you showed above.
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
credativ GmbH, HRB Mönchengladbach 12080
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
|Next Message||Andres Freund||2019-03-19 21:44:52||Re: Online verification of checksums|
|Previous Message||Robert Haas||2019-03-19 21:34:37||Re: Online verification of checksums|