Re: Offline enabling/disabling of data checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2018-12-28 09:12:24
Message-ID: CABUevEx_JyTWpSRkSf-Wyk+zuZqiOK=t5amqvD4bCQHyDjUfpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 28, 2018 at 1:14 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> On 12/28/18 12:25 AM, Michael Paquier wrote:
> > On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
> >> On 12/27/18 11:43 AM, Magnus Hagander wrote:
> >>> Should we double-check with packagers that this won't cause a problem?
> >>> Though the fact that it's done in a major release should make it
> >>> perfectly fine I think -- and it's a smaller change than when we did
> all
> >>> those xlog->wal changes...
> >>>
> >>
> >> I think it makes little sense to not rename the tool now. I'm pretty
> >> sure we'd end up doing that sooner or later anyway, and we'll just live
> >> with a misnamed tool until then.
> >
> > Do you think that a thread Would on -packagers be more adapted then?
> >
>
> I'm sorry, but I'm not sure I understand the question. Of course, asking
> over at -packagers won't hurt, but my guess is the response will be it's
> not a big deal from the packaging perspective.
>

I think a heads- up in the way of "planning to change it, now's the time to
yell" is the reasonable thing.

>> I don't know, TBH. I agree making the on/off change cheaper makes moves
> >> us closer to 'on' by default, because they may disable it if needed. But
> >> it's not the whole story.
> >>
> >> If we enable checksums by default, 99% users will have them enabled.
> >> That means more people will actually observe data corruption cases that
> >> went unnoticed so far. What shall we do with that? We don't have very
> >> good answers to that (tooling, docs) and I'd say "disable checksums" is
> >> not a particularly amazing response in this case :-(
> >
> > Enabling data checksums by default is still a couple of steps ahead,
> > without a way to control them better..
> >
>
> What do you mean by "control" here? Dealing with checksum failures, or
> some additional capabilities?
>
> >> FWIW I don't know what to do about that. We certainly can't prevent the
> >> data corruption, but maybe we could help with fixing it (although that's
> >> bound to be low-level work).
> >
> > Yes, data checksums are extremely useful to tell people when the
> > problem is *not* from Postgres, which can be really hard in a large
> > organization. Knowing about the corrupted page is also useful as you
> > can look at its contents and look at its bytes before it gets zero'ed
> > to spot patterns which can help other teams in charge of a lower level
> > of the application layer.
>
> I'm not sure data checksums are particularly great evidence. For example
> with the recent fsync issues, we might have ended with partial writes
> (and thus invalid checksums). The OS migh have even told us about the
> failure, but we've gracefully ignored it. So I'm afraid data checksums
> are not a particularly great proof it's not our fault.
>

They are a great evidence that your data is corrupt. You *want* to know
that your data is corrupt. Even if our best recommendation is "go restore
your backups", you still want to know. Otherwise you are sitting around on
data that's corrupt and you don't know about it.

There are certainly many things we can do to improve the experience. But
not telling people their data is coorrupt when it is, isn't one of them.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-12-28 11:07:30 Re: random() (was Re: New GUC to sample log queries)
Previous Message Peter Eisentraut 2018-12-28 08:55:49 Re: insensitive collations