Re: Offline enabling/disabling of data checksums

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-13 09:08:33
Message-ID: alpine.DEB.2.21.1903130805330.4059@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Now the set of patches is:
> - 0001, add --enable and --disable. I have tweaked a bit the patch so
> as "action" is replaced by "mode" which is more consistent with other
> tools like pg_ctl. pg_indent was also complaining about one of the
> new enum structures.

Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be
improved because "failed" does not describe a no-op-was-needed to me.

If enabling or disabling checksums, the exit status is nonzero if the
operation failed.

Maybe: "... if the operation failed or the requested setting is already
active." would at least describe clearly the implemented behavior.

+ printf(_(" -c, --check check data checksums\n"));
+ printf(_(" This is the default mode if nothing is specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence
does not end with a ".". I could not find an instance of this style in
other help on pg commands. I'd suggest "check data checksums (default)"
would work around and be more in line with other commands help.

I see a significant locking issue, which I discussed on other threads
without convincing anyone. I could do the following things:

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new
file, then started a "pg_checksums --enable" on a stopped cluster, then
started the cluster while the enabling was in progress, then connected and
updated data. Hmmm. Then I stopped while the slow enabling was still in
progress. Then I could also run a fast pg_checksums --enable in parallel,
overtaking the first one... then when the fast one finished, I started the
cluster again. When the slow one finished, it overwrote the control file,
I had a running cluster with a control file which did not say so, so I
could disable the checksum. Hmmm again. Ok, I could not generate a
inconsistent state because on stopping the cluster the cluster file is
overwritten with the initial state from the point of view of postmater,
but it does not look good.

I do not think it is a good thing that two commands can write to the data
directory at the same time, really.

About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.

> - 0002, add --no-sync.

Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-13 09:13:28 Re: Fix volatile vs. pointer confusion
Previous Message Georgios Kokolatos 2019-03-13 08:55:10 Re: CPU costs of random_zipfian in pgbench