|From:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>|
|To:||Michael Paquier <michael(at)paquier(dot)xyz>|
|Cc:||Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Offline enabling/disabling of data checksums|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
>> I think that the motivation/risks should appear before the solution. "As xyz
>> ..., ...", or there at least the logical link should be outlined.
>> It is not clear for me whether the following sentences, which seems specific
>> to "pg_rewind", are linked to the previous advice, which seems rather to
>> refer to streaming replication?
> Do you have a better idea of formulation?
I can try, but I must admit that I'm fuzzy about the actual issue. Is
there a problem on a streaming replication with inconsistent checksum
settings, or not?
You seem to suggest that the issue is more about how some commands or
backup tools operate on a cluster.
I'll reread the thread carefully and will make a proposal.
> Imagine for example a primary-standby with checksums disabled: [...]
Yep, that's cool.
>> Should not disabling in reverse order be safe? the checksum are not checked
> I don't quite understand your comment about the ordering. If all the
> standbys are destroyed first, then enabling/disabling checksums happens
> at a single place.
Sure. I was suggesting that disabling on replicated clusters is possibly
safer, but do not know the detail of replication & checksumming with
enough precision to be that sure about it.
>> After the reboot, some data files are not fully updated with their
>> checksums, although the controlfiles tells that they are. It should then
>> fail after a restart when a no-checksum page is loaded?
>> What am I missing?
> Please note that we do that in other tools as well and we live fine
> with that as pg_basebackup, pg_rewind just to name two.
The fact that other commands are exposed to the same potential risk is not
a very good argument not to fix it.
> I am not saying that it is not a problem in some cases, but I am saying
> that this is not a problem that this patch should solve.
As solving the issue involves exchanging two lines and turning one boolean
parameter to true, I do not see why it should not be done. Fixing the
issue takes much less time than writing about it...
And if other commands can be improved fine with me.
> If we were to do something about that, it could make sense to make
> fsync_pgdata() smarter so as the control file is flushed last there, or
> define flush strategies there.
ISTM that this would not work: The control file update can only be done
*after* the fsync to describe the cluster actual status, otherwise it is
just a question of luck whether the cluster is corrupt on an crash while
fsyncing. The enforced order of operation, with a barrier in between, is
the important thing here.
|Next Message||Fabien COELHO||2019-03-21 09:57:41||Re: Offline enabling/disabling of data checksums|
|Previous Message||Michael Paquier||2019-03-21 07:13:55||Re: MSVC Build support with visual studio 2019|