Re: Offline enabling/disabling of data checksums

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-20 22:59:24
Message-ID: 20190320225924.GC20192@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
> I think that the motivation/risks should appear before the solution. "As xyz
> ..., ...", or there at least the logical link should be outlined.
>
> It is not clear for me whether the following sentences, which seems specific
> to "pg_rewind", are linked to the previous advice, which seems rather to
> refer to streaming replication?

Do you have a better idea of formulation? If you have a failover
which makes use of pg_rewind, or use some backup tool which does
incremental copy of physical blocks like pg_rman, then you have a risk
to mess up instances in a cluster which is the risk I am trying to
outline here. It is actually fine to do the following actually if
everything is WAL-based as checksums are only computed once a shared
buffer is flushed on a single instance. Imagine for example a
primary-standby with checksums disabled:
- Shutdown cleanly standby, enable checksums.
- Plug back standby to cluster, let it replay up to the latest point.
- Shutdown cleanly primary.
- Promote standby.
- Enable checksums on the previous primary.
- Add recovery parameters and recommend the primary back to the
cluster.
- All nodes have checksums enabled, without rebuilding any of them.
Per what I could see on this thread, folks tend to point out that
we should *not* include such recommendations in the docs, as it is too
easy to do a pilot error.

> > + When using a cluster with
> > + tools which perform direct copies of relation file blocks (for example
> > + <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
> > + lead to page corruptions in the shape of incorrect checksums if the
> > + operation is not done consistently across all nodes. Destroying all
> > + the standbys in a cluster first, enabling or disabling checksums on
> > + the primary and finally recreate the cluster nodes from scratch is
> > + also safe.
> > + </para>
> > + </refsect1>
>
> Should not disabling in reverse order be safe? the checksum are not checked
> afterwards?

I don't quite understand your comment about the ordering. If all
the standbys are destroyed first, then enabling/disabling checksums
happens at a single place.

> After the reboot, some data files are not fully updated with their
> checksums, although the controlfiles tells that they are. It should then
> fail after a restart when a no-checksum page is loaded?
>
> What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two. I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve. If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2019-03-20 23:41:32 Re: PostgreSQL pollutes the file system
Previous Message Tatsuo Ishii 2019-03-20 22:56:59 Re: PostgreSQL pollutes the file system