Re: [PATCH] Verify Checksums during Basebackups

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Verify Checksums during Basebackups
Date: 2018-03-04 14:49:18
Message-ID: 20180304144918.GQ2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings Magnus, all,

* Magnus Hagander (magnus(at)hagander(dot)net) wrote:
> I think it would also be a good idea to have this a three-mode setting,
> with "no check", "check and warning", "check and error". Where "check and
> error" should be the default, but you could turn off that in "save whatever
> is left mode". But I think it's better if pg_basebackup simply fails on a
> checksum error, because that will make it glaringly obvious that there is a
> problem -- which is the main point of checksums in the first place. And
> then an option to turn it off completely in cases where performance is the
> thing.

When we implemented page-level checksum checking in pgBackRest, David
and I had a good long discussion about exactly this question of "warn"
vs. "error" and came to a different conclusion- you want a backup to
always back up as much as it can even in the face of corruption. If the
user has set up their backups in such a way that they don't see the
warnings being thrown, it's a good bet they won't see failed backups
happening either, in which case they might go from having "mostly" good
backups to not having any. Note that I *do* think a checksum failure
should result in an non-zero exit-code result from pg_basebackup,
indicating that there was something which went wrong.

One difference is that with pgBackRest, we manage the backups and a
backup with page-level checksums isn't considered "valid", so we won't
expire old backups if a new backup has a checksum failure, but I'm not
sure that's really enough to change my mind on if pg_basebackup should
outright fail on a checksum error or if it should throw big warnings but
still try to perform the backup. If the admin sets things up in a way
that a warning and error-exit code from pg_basebackup is ignored and
they still expire out their old backups, then even having an actual
error result wouldn't change that.

As an admin, the first thing I would want in a checksum failure scenario
is a backup of everything, even the blocks which failed (and then a
report of which blocks failed...). I'd rather we think about that
use-case than the use-case where the admin sets up backups in such a way
that they don't see warnings being thrown from the backup.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-04 15:00:20 Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types
Previous Message Michael Banck 2018-03-04 14:24:18 Re: Online enabling of checksums