Re: Offline enabling/disabling of data checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-13 10:41:15
Message-ID: 1552473675.4947.62.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
>
> Good idea, let's do that.
>
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
>
> Well, yes, don't do that. You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
>
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
>
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel. That would be... Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

> > About fsync-ing: ISTM that it is possible that the control file is written
> > to disk while data are still not written, so a failure in between would
> > leave the cluster with an inconsistent state. I think that it should fsync
> > the data *then* update the control file and fsync again on that one.
>
> if --enable is used, we fsync the whole data directory after writing
> all the blocks and updating the control file at the end. The case you
> are referring to here is in fsync_pgdata(), not pg_checksums actually,
> because you could reach the same state after a simple initdb.

But in the initdb case you don't have any valuable data in the instance
yet.

> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you? I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-13 10:41:24 Re: Offline enabling/disabling of data checksums
Previous Message Imai, Yoshikazu 2019-03-13 10:34:50 RE: speeding up planning with partitions