From e7a1f9e599ec56a3b142b943b2351b10b5037585 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 19 Jun 2018 16:03:51 +0900
Subject: [PATCH] Add note in pg_rewind documentation about read-only files

When performing pg_rewind, the presence of a read-only file which is not
accessible for writes will cause a failure while processing.  This can
cause the control file of the target data folder to be truncated,
causing it to not be reusable with a successive run.

We have discussed on the thread a couple of ways to deal with the
problem:
1) Consider EACCES failures on relation files as critical not not on
non-relation files, which goes down to custom configuration files as
well as FSM or VM files.  Being able to make the difference between
custom files and the ones critical for the system requires additional
maintenance with the addition of new filtering rules, which is costly in
the long term.
2) Order the fetched block ranges and delay the truncation of a file
only when its first range chunk is received.  That's actually not
reliable either, as when facing a failure some of the relation files may
have been already manipulated.  If by chance one if able to start and
stop the target's server after a failed rewind, they have good chances
to have already a broken instance.

There are some solutions which could be considered:
1) Add pre-checks making sure that a set of files which are going to be
processed for copy can be sanely written to, and complain about it
before writing any data on the target's data folder.  This has the
disadvantage that a user would still need to rebuild the links used for
what was previously a set of read-only files, and also to fetch
read-only files on the source and save them as raw files locally, which
could be security-sensitive.
2) Prevent the data of read-only files to be fetched from the source,
which would save some bandwidth, but requires the backend's
pg_stat_file() to be extended with an entry's st_mode.  This can only
happen on HEAD.
3) Allow callers to specify custom exclusion rules, however this could
be a foot-gun for anything accidentally filtering out critical files.

1) is the one causing the less code churn, still it is not clear to me
if any of those are worth considering anyway as read-only files would
need to be most likely-rebuilt after a rewind.  Hence the most simple
solution is to just document the behavior and tell users to not do
that.  We could always consider new solutions in future releases to ease
the handling of such files, but that may not be worth it.  2) would be
rather interesting, but that's a lot of infrastructure to justify.

Also, when pg_rewind fails mid-flight, there is likely no way to be able
to recover the target data folder anyway, in which case a new base
backup is the best option.  A note is added in the documentation as
well.

Discussion: https://postgr.es/m/20180104200633.17004.16377%40wrigleys.postgresql.org
---
 doc/src/sgml/ref/pg_rewind.sgml | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 520d843f0e..ee35ce18b0 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -95,6 +95,26 @@ PostgreSQL documentation
    are currently on by default.  <xref linkend="guc-full-page-writes"/>
    must also be set to <literal>on</literal>, but is enabled by default.
   </para>
+
+  <warning>
+   <para>
+    If <application>pg_rewind</application> fails while processing, then
+    the data folder of the target is likely not in a state that can be
+    recovered.  In such a case, taking a new fresh backup is recommended.
+   </para>
+
+   <para>
+    <application>pg_rewind</application> will fail immediately if it finds
+    files it cannot write directly to.  This can happen for example when
+    the source and the target server use the same file mapping for read-only
+    SSL keys and certificates.  If such files are present on the target server
+    it is recommended to remove them before running
+    <application>pg_rewind</application>.  After doing the rewind, some of
+    those files may have been copied from the source, in which case it may
+    be necessary to remove the data copied and restore back the set of links
+    used before the rewind.
+   </para>
+  </warning>
  </refsect1>
 
  <refsect1>
-- 
2.17.1

