|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>|
|Cc:||Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Offline enabling/disabling of data checksums|
|Views:||Raw Message | Whole Thread | Download mbox|
On Thu, Mar 21, 2019 at 08:17:32AM +0100, Fabien COELHO wrote:
> I can try, but I must admit that I'm fuzzy about the actual issue. Is there
> a problem on a streaming replication with inconsistent checksum settings, or
> You seem to suggest that the issue is more about how some commands or backup
> tools operate on a cluster.
Yes. That's what I am writing about. Imagine for example this case
- primary has checksums enabled.
- standby has checksums disabled.
- a hard crash of the primary happens, there is a failover to the
standby which gets promoted.
- The primary's host is restarted, it is started and stopped once
cleanly to have a clear point in its past timeline where WAL forked
thanks to the generation of at least the shutdown checkpoint generated
by the clean stop.
- pg_rewind is run, copying some pages from the promoted standby,
which don't have checksums, to the primary with checksums enabled, and
causing some pages to have an incorrect checksum.
There is another tool I know of which is called pg_rman, which is a
backup tool able to take incremental backups in the shape of a delta
of relation blocks. Then imagine the following:
- One instance of Postgres runs, has checksums disabled.
- pg_rman takes a full backup of it.
- Checksums are enabled on this instance.
- An incremental backup from the previous full backup point is taken.
If I recall correctly pg_rman takes a copy of the new control file as
well, which tracks checksums as being enabled.
- A crash happens, the data folder is dead.
- Rollback to the previous backup is done, and we restore up to a
point after the incremental backup.
- And you finish with a cluster which has checksums enabled, but as
the initial full backup had checksums disabled, not all the pages may
be in a correct state.
So I think that it makes sense to tell to be careful within the
documentation, but being too much careful in the tool discards also
many possibilities (see the example of the clean failover where it is
possible to enable checksums with no actual downtime). And this part
has a lot of value.
> ISTM that this would not work: The control file update can only be done
> *after* the fsync to describe the cluster actual status, otherwise it is
> just a question of luck whether the cluster is corrupt on an crash while
> fsyncing. The enforced order of operation, with a barrier in between, is the
> important thing here.
Done the switch for this case. For pg_rewind actually I think that
this is an area where its logic could be improved a bit. So first
the data folder is synced, and then the control file is updated. It
took less time to change the code than to write this paragraph,
including the code compilation and one run of the TAP tests,
I have added in the docs a warning about a host crash while doing the
operation, with a recommendation to check the state of the checksums
on the data folder should it happen, and the previous portion of the
docs about clusters. Your suggestion sounds adapted. I would be
tempted to add a bigger warning in pg_rewind or pg_basebackup about
that, but that's a different story for another time.
Does that look fine to you?
|Next Message||Michael Paquier||2019-03-22 00:36:11||Re: PostgreSQL pollutes the file system|
|Previous Message||Amit Langote||2019-03-22 00:21:56||Re: ToDo: show size of partitioned table|