Re: Offline enabling/disabling of data checksums

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-19 10:48:25
Message-ID: alpine.DEB.2.21.1903191124080.18118@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bonjour Michaël,

> Please find attached an updated patch set, I have rebased that stuff
> on top of my recent commits to refactor the control file updates.

Patch applies cleanly, compiles, make check-world seems ok, doc build ok.

It would help if the patch includes a version number. I assume that this
is v7.

Doc looks ok.

Moving the controlfile looks like an effective way to prevent any
concurrent start, as the fs operation is probably atomic and especially if
external tools uses the same trick. However this is not the case yet, eg
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method
could be unified, possibly with some functions in "controlfile_utils.c".

However, I think that there still is a race condition because of the order
in which it is implemented:

pg_checksums reads control file
pg_checksums checks control file contents...
** cluster may be started and the control file updated
pg_checksums moves the (updated) control file
pg_checksums proceeds on a running cluster
pg_checksums moves back the control file
pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

pg_checksums moves the control file
pg_checksums reads, checks, proceeds, updates
pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function
to allow a suffix. No big deal.

Ok, this might not work, because of the following, less likely, race
condition:

postmaster opens control file RW
pg_checksums moves control file, posmater open file handle follows
...

So ISTM that we really need some locking to have something clean.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

Help line about --check not simplified as suggested in a prior review,
although you said you would take it into account.

Tests look ok.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Prajwal A V 2019-03-19 10:51:16 Contribution to Perldoc for TestLib module in Postgres
Previous Message Jiří Fejfar 2019-03-19 10:47:13 Re: extensions are hitting the ceiling