|From:||Michael Banck <michael(dot)banck(at)credativ(dot)de>|
|To:||Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>|
|Cc:||PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [patch] Fix checksum verification in base backups for zero page headers|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Am Mittwoch, den 02.09.2020, 16:50 +0300 schrieb Anastasia Lubennikova:
> On 01.09.2020 13:22, Michael Banck wrote:
> > as a continuation of , I've split-off the zero page header case from
> > the last patch, as this one seems less contentious.
> >  https://commitfest.postgresql.org/28/2308/
> I've looked through the previous discussion. As far as I got it, most of
> the controversy was about online checksums improvements.
> The warning about pd_upper inconsistency that you've added is a good
> addition. The patch is a bit messy, though, because a huge code block
> was shifted.
> Will it be different, if you just leave
> "if (!PageIsNew(page) && PageGetLSN(page) < startptr)"
> block as it was, and add
> "else if (PageIsNew(page) && !PageIsZero(page))" ?
Thanks, that does indeed look better as a patch and I think it's fine
as-is for the code as well, I've attached a v2.
> While on it, I also have a few questions about the code around:
All fair points, but I think those should be handled in another patch,
> 1) Maybe we can use other header sanity checks from PageIsVerified() as
> well? Or even the function itself.
Yeah, I'll keep that in mind.
> 2) > /* Reset loop to validate the block again */
> How many times do we try to reread the block? Is one reread enough?
Not sure whether more rereads would help, but I think the general
direction was to replace this with something better anyway I think (see
> Speaking of which, 'reread_cnt' name looks confusing to me. I would
> expect that this variable contains a number of attempts, not the number
> of bytes read.
Well, there are other "cnt" variables that keep the number of bytes read
in that file, so it does not seem to be out of place for me. A comment
might not hurt though.
On second glance, it should maybe also be of size_t and not int type, as
is the other cnt variable?
> If everyone agrees, that for basebackup purpose it's enough to rely on a
> single reread, I'm ok with it.
> Another approach is to read the page directly from shared buffers to
> ensure that the page is fine. This way is more complicated, but should
> be almost bullet-proof. Using it we can also check pages with lsn >=
Right, that's what Andres also advocated for initially I think, and what
should be done going forward.
> 3) Judging by warning messages, we count checksum failures per file, not
> per page, and don't report after a fifth failure. Why so? Is it a
> common case that so many pages of one file are corrupted?
I think this was a request during the original review, on the basis that
if we have more than a few checksum errors, it's likely there is
something fundamentally broken and it does not make sense to spam the
log with potentially millions of broken checksums.
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
credativ GmbH, HRB Mönchengladbach 12080
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
|Next Message||Michael Banck||2020-09-22 14:30:19||Re: [patch] Fix checksum verification in base backups for zero page headers|
|Previous Message||Tom Lane||2020-09-22 14:18:40||Re: pgindent vs dtrace on macos|