Re: Offline enabling/disabling of data checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-01-05 22:12:14
Message-ID: 20190105221214.GW2528@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 12/27/18 11:43 AM, Magnus Hagander wrote:
> > Plus, the majority of people *should* want them on :) We don't run with
> > say synchronous_commit=off by default either to make it easier on those
> > that don't want to pay the overhead of full data safety :P (I know it's
> > not a direct match, but you get the idea)

+1 to having them on by default, we should have done that a long time
ago.

> I don't know, TBH. I agree making the on/off change cheaper makes moves
> us closer to 'on' by default, because they may disable it if needed. But
> it's not the whole story.
>
> If we enable checksums by default, 99% users will have them enabled.

Yes, and they'll then be able to catch data corruption much earlier.
Today, 99% of our users don't have them enabled and have no clue if
their data has been corrupted on disk, or not. That's not good.

> That means more people will actually observe data corruption cases that
> went unnoticed so far. What shall we do with that? We don't have very
> good answers to that (tooling, docs) and I'd say "disable checksums" is
> not a particularly amazing response in this case :-(

Now that we've got a number of tools available which will check the
checksums in a running system and throw up warnings when found
(pg_basebackup, pgBackRest and I think other backup tools,
pg_checksums...), users will see corruption and have the option to
restore from a backup before those backups expire out and they're left
with a corrupt database and backups which also have that corruption.

This ongoing call for specific tooling to do "something" about checksums
is certainly good, but it's not right to say that we don't have existing
documentation- we do, quite a bit of it, and it's all under the heading
of "Backup and Recovery".

> FWIW I don't know what to do about that. We certainly can't prevent the
> data corruption, but maybe we could help with fixing it (although that's
> bound to be low-level work).

There's been some effort to try and automagically correct corrupted
pages but it's certainly not something I'm ready to trust beyond a
"well, this is what it might have been" review. The answer today is to
find a backup which isn't corrupt and restore from it on a known-good
system. If adding explicit documentation to that effect would reduce
your level of concern when it comes to enabling checksums by default,
then I'm happy to do that.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-01-05 22:40:10 Re: Record last password change
Previous Message Mitar 2019-01-05 21:57:39 Re: Feature: triggers on materialized views