Re: Offline enabling/disabling of data checksums

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2018-12-27 23:25:29
Message-ID: 20181227232529.GA3210@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>> Should we double-check with packagers that this won't cause a problem?
>> Though the fact that it's done in a major release should make it
>> perfectly fine I think -- and it's a smaller change than when we did all
>> those xlog->wal changes...
>>
>
> I think it makes little sense to not rename the tool now. I'm pretty
> sure we'd end up doing that sooner or later anyway, and we'll just live
> with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

> I don't know, TBH. I agree making the on/off change cheaper makes moves
> us closer to 'on' by default, because they may disable it if needed. But
> it's not the whole story.
>
> If we enable checksums by default, 99% users will have them enabled.
> That means more people will actually observe data corruption cases that
> went unnoticed so far. What shall we do with that? We don't have very
> good answers to that (tooling, docs) and I'd say "disable checksums" is
> not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

> FWIW I don't know what to do about that. We certainly can't prevent the
> data corruption, but maybe we could help with fixing it (although that's
> bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization. Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-12-27 23:32:31 Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Previous Message Michael Paquier 2018-12-27 23:15:14 Re: could recovery_target_timeline=latest be the default in standby mode?