WIP: Restricting pg_rewind to data/wal dirs

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-28 11:22:20
Message-ID: CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers


There are still some cleanup bits needed here but I wanted to get feedback
on my general direction.

I hope to submit for commit fest soon if the general feedback is good.
Tests are passing (with adjustments intended for change of behaviour in one
test script). I want to note that Crimson Thompson (also with Adjust) has
provided some valuable discussion on this code.

The Problem:

pg_rewind makes no real guarantees regarding the final state of non-data
files or directories. It.checks to see if the timeline has incremented
(and therefore guarantees that if successful the data directories are on
the same timeline) but for non-data files, these are clobbered if we rewind
and left intact if not. These other files include postgresql.auto.conf,
replication slots, and can include log files.

Copying logs over to the new slave is something one probably never wants to
do (same with replication slots), and the behaviours here can mask
troubleshooting regarding what a particular master failed, cause wal
segments to build up, automatic settings changes, and other undesirable
behaviours. Together these make pg_rewind very difficult to use properly
and push tasks to replication management tooling that the management tools
are not in a good position to handle correctly.

Designing the Fix:

Two proposed fixes have been considered and one selected: Either we
whitelist directories and only rewind those. The other was to allow shell
globs to be provided that could be used to exclude files. The whitelisting
solution was chosen for a number of reasons.

When pg_rewind "succeeds" but not quite correctly the result is usually a
corrupted installation which requires a base backup to replace it anyway.
In a recovery situation, sometimes pressures occur which render human
judgment less effective, and therefore glob exclusion strikes me as
something which would probably do more harm than good, but maybe I don't
understand the use case (comments as to why some people look at the other
solution as preferable would be welcome).

In going with the whitelisting solution, we chose to include all
directories with WAL-related information. This allows more predicable
interaction with other parts of the replication chain. Consequently not
only do we copy pg_wal and pg_xact but also commit timestamps and so forth.

The Solution:

The solution is a whitelist of directories specified which are the only
ones which are synchronised. The relevant part of this patch is:

+/* List of directories to synchronize:

+ * base data dirs (and ablespaces)

+ * wal/transaction data

+ * and that is it.

+ *

+ * This array is null-terminated to make

+ * it easy to expand

+ */


+const char *rewind_dirs[] = {

+ "base",

+ "global",

+ "pg_commit_ts",

+ "pg_logical",

+ "pg_multixact",

+ "pg_serial",

+ "pg_subtrans",

+ "pg_tblspc",

+ "pg_twophase",

+ "pg_wal",

+ "pg_xact",



From there we iterate over this array for directory-based approaches in
copy_fetch.c and with one query per directory in libpq_fetch.c. This also
means shifting from the basic interface from PQexec to PQexecParams. It
would be possible to move to binary formats too, but this was not done
currently in order to simplify code review (that could be a separate
independent patch at a later time).

Testing Done:

The extra files tests correctly test this change in behaviour. The tests
have been modified, and diagnostics in cases of failures expanded, in this
case. The other tests provide good coverage of whether pg_rewind does what
it is supposed to do.

Cleanup still required:

I accidentally left Carp::Always in the PM in this perl module. It will be

I renamed one of the functions used to have a more descriptive name but
currently did not remove the old function yet.

Feedback is very welcome. pg_rewind is a very nice piece of software. I
am hoping that these sorts of changes will help ensure that it is easier to
use and provides more predictable results.
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
pg_rewind_restrict.patch application/octet-stream 11.4 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-10-28 12:46:46 Re: Re: proposal - psql: possibility to specify sort for describe commands, when size is printed
Previous Message Simon Riggs 2017-10-28 10:10:50 Re: MERGE SQL Statement for PG11