Re: More issues with pg_verify_checksums and checksum verification in base backups

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, andrew(at)dunslane(dot)net, daniel(at)yesql(dot)se, magnus(at)hagander(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: More issues with pg_verify_checksums and checksum verification in base backups
Date: 2018-11-20 03:17:19
Message-ID: 20181120031719.GY3415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2018-11-19 21:18:43 -0500, Stephen Frost wrote:
> > As has been mentioned elsewhere, there's really a 'right' way to do
> > things and allowing PG to be 'extensible' by simply ignoring random
> > files showing up isn't that- if we want PG to be extensible in this way
> > then we need to provide a mechanism for that to happen.
>
> I still don't buy this argument. I'm giving up here, as I just don't
> have enough energy to keep up with this discussion.
>
> FWIW, I think it's bad, that we don't error out on checksum failures in
> basebackups by default. And that's only really feasible with a
> whitelisting approach.

No, we could error out on checksum failures in either approach, but we
explicitly don't with good reason: if you're doing a backup, you
probably want to actually capture the current data.

This is something we've thought quite a bit about. In fact, as I
recall, the original pg_basebackup code actually *did* error out, even
with the blacklist approach, and we made a solid argument which was
ultimately agreed to by those involved at the time that erroring out
half-way through was a bad idea.

What we do instead is exit with a non-zero exit code to make it clear
that there was an issue, to allow the user to capture that and raise
alarms, but to still have all of the data which we were able to
capture in the hopes that the backup is at least salvagable. In
addition, at least in pgbackrest, we don't consider such a backup to be
pristine and therefore we don't expire out the prior backups- we don't
do any backup expiration in pg_basebackup, so it's up to the user to
make sure that if pg_basebackup exits with a non-zero exit code that
they capture and handle that and *don't* blow away a previously good
backup.

The very last thing *any* backup tool should do is give up half-way
through and throw a nasty error, leaving you with the knowledge that
your system is hosed *and* no backup of what was there exist and
making it extremely likely that whatever corruption exists is being
propagated further.

Let's try to not conflate these two issues though, they're quite
independent.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-20 05:00:39 typo fix
Previous Message Amit Kapila 2018-11-20 03:06:00 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query