Re: [PATCH] Verify Checksums during Basebackups

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-23 09:36:27
Message-ID: 1521797787.15036.23.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

thanks for the review!

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
> On 3/17/18 5:34 PM, Michael Banck wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> >
> > Updated patch attached.
>
> + memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
>
> Why make a copy here? How about:
>
> char *page = buf + BLCKSZ * i

Right, ok.

> I know pg_checksum_page manipulates the checksum field but I have found
> it to be safe.
>
> + if (phdr->pd_checksum != checksum)
>
> I've attached a patch that adds basic retry functionality. It's not
> terrible efficient since it rereads the entire buffer for any block
> error. A better way is to keep a bitmap for each block in the buffer,
> then on retry compare bitmaps. If the block is still bad, report it.
> If the block was corrected moved on. If a block was good before but is
> bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?

> + ereport(WARNING,
> + (errmsg("checksum verification failed in file "
>
> I'm worried about how verbose this warning could be since there are
> 131,072 blocks per segment. It's unlikely to have that many block
> errors, but users do sometimes put files in PGDATA which look like they
> should be validated. Since these warnings all go to the server log it
> could get pretty bad.

We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.

> We should either stop warning after the first failure, or aggregate the
> failures for a file into a single message.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-03-23 09:57:15 Re: Faster inserts with mostly-monotonically increasing values
Previous Message Ashutosh Bapat 2018-03-23 09:30:37 Comment update in BuildTupleFromCStrings()