Re: Offline enabling/disabling of data checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-22 09:04:02
Message-ID: 1553245442.4884.3.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier:
> On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:
> > Don't we need a big warning that the cluster must not be started during
> > operation of pg_checksums as well, now that we don't disallow it?
>
> The same applies to pg_rewind and pg_basebackup, so I would classify
> that as a pilot error.

How would it apply to pg_basebackup? The cluster is running while the
base backup is taken and I believe the control file is written at the
end so you can't start another instance off the backup directory until
the base backup has finished.

It would apply to pg_rewind, but pg_rewind's runtime is not scaling with
cluster size, does it? pg_checksums will run for hours on large clusters
so the window of errors is much larger and I don't think you can easily
compare the two.

> How would you formulate that in the docs if you add it.

(I would try to make sure you can't start the cluster but that seems off
the table for now)

How about this:

+ <refsect1>
+ <title>Notes</title>
+ <para>
+ When enabling checksums in a cluster, the operation can potentially take a
+ long time if the data directory is large. During this operation, the
+ cluster or other programs that write to the data directory must not be
+ started or else data-loss will occur.
+ </para>
+
+ <para>
+ When disabling or enabling checksums in a cluster of multiple instances,
[...]

Also, the following is not very clear to me:

+   If the event of a crash of the operating system while enabling or

s/If/In/

+ disabling checksums, the data folder may have checksums in an inconsistent
+   state, in which case it is recommended to check the state of checksums
+   in the data folder.

How is the user supposed to check the state of checksums? Do you mean
that if the user intended to enable checksums and the box dies in
between, they should check whether checksums are actually enabled and
re-run if not? Because it could also mean running pg_checksums --check
on the cluster, which wouldn't work in that case as the control file has
not been updated yet.

Maybe it could be formulated like "If pg_checksums is aborted or killed
in its operation while enabling or disabling checksums, the cluster
will have the same state with respect of checksums as before the
operation and pg_checksums needs to be restarted."?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-22 09:20:19 Re: pg_upgrade version checking questions
Previous Message Michael Paquier 2019-03-22 08:37:16 Re: Offline enabling/disabling of data checksums