Re: Offline enabling/disabling of data checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Christoph Berg <myon(at)debian(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-15 08:57:50
Message-ID: CABUevExPBFPoXXSNALMeneTZn7nLTTpNhDZSUcXjqr6NMdoFgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 14, 2019 at 4:54 PM Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> > On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon(at)debian(dot)org> wrote:
> > > Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=
> ZmB8qck7nDmcHEY5O5g(at)mail(dot)gmail(dot)com>
> > > > Are you suggesting we should support running with a master with
> checksums
> > > > on and a standby with checksums off in the same cluster? That
> seems.. Very
> > > > fragile.
> > >
> > > The case "shut down master and standby, run pg_checksums on both, and
> > > start them again" should be supported. That seems safe to do, and a
> > > real-world use case.
> >
> > I can agree with that, if we can declare it safe. You might need some
> > way to ensure it was shut down cleanly on both sides, I'm guessing.
> >
> > > Changing the system id to a random number would complicate this.
> > >
> > > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > > id would work?)
> >
> > Or any other way of changing the systemid in a predictable way would
> > also work, right? As long as it's done the same on both sides. And
> > that way it would look different to any system that *doesn't* know
> > what it means, which is probably a good thing.
>
> If we change the system identifier, we'll have to reset the WAL as well
> or otherwise we'll get "PANIC: could not locate a valid checkpoint
> record" on startup. So even if we do it predictably on both primary and
> standby I guess the standby would need to be re-cloned?
>
> So I think an option that skips that for people who know what they are
> doing with the streaming replication setup would be required, should we
> decide to bump the system identifier.
>

Ugh. I did not think of that one. But yes, the main idea there would be
that if you turn on checksums on the primary then you have to re-clone all
standbys. That's what happens if we change the system idenfier -- that's
why it's the "big hammer method".

But yeah, an option to avoid it could be one way to deal with it. If we
could find some safer way to handle it that'd be better, but otherwise
changing the sysid by default and having an option to turn it off could be
one way to deal with it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-03-15 09:01:32 Re: Offline enabling/disabling of data checksums
Previous Message Magnus Hagander 2019-03-15 08:52:11 Re: Offline enabling/disabling of data checksums