Re: [PATCH] Verify Checksums during Basebackups

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-22 16:22:31
Message-ID: f2c3b94f-3100-2412-1077-86caf7c62551@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Michael,

On 3/17/18 5:34 PM, Michael Banck wrote:
>
> On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
>
> I think most people (including those I had off-list discussions about
> this with) were of the opinion that such an option should be there, so I
> added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> replication command and also an option -k / --no-verify-checksums to
> pg_basebackup to trigger this.
>
> Updated patch attached.

+ memcpy(page, (buf + BLCKSZ * i), BLCKSZ);

Why make a copy here? How about:

char *page = buf + BLCKSZ * i

I know pg_checksum_page manipulates the checksum field but I have found
it to be safe.

+ if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality. It's not
terrible efficient since it rereads the entire buffer for any block
error. A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps. If the block is still bad, report it.
If the block was corrected moved on. If a block was good before but is
bad on retry it can be ignored.

+ ereport(WARNING,
+ (errmsg("checksum verification failed in file "

I'm worried about how verbose this warning could be since there are
131,072 blocks per segment. It's unlikely to have that many block
errors, but users do sometimes put files in PGDATA which look like they
should be validated. Since these warnings all go to the server log it
could get pretty bad.

We should either stop warning after the first failure, or aggregate the
failures for a file into a single message.

Some tests with multi-block errors should be added to test these scenarios.

Thanks,
--
-David
david(at)pgmasters(dot)net

Attachment Content-Type Size
reread.patch text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-03-22 16:27:00 Re: FOR EACH ROW triggers on partitioned tables
Previous Message Tom Lane 2018-03-22 16:19:49 Re: Error detail/hint style fixup