Using base backup exclusion filters to reduce data transferred with pg_rewind

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Using base backup exclusion filters to reduce data transferred with pg_rewind
Date: 2018-02-05 07:10:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

Many threads have touched $subject:

Thread [2] is a bit different as it discusses with WAL segments data
which is useless at the end, still it aimed at reducing the amount of
data transferred during a rewind. I am not tackling this problem in the
patch set of this thread. This can be discussed separately.

Attached is a patch set which implements what I have mentioned a couple
of times in those threads by having pg_rewind reuse the same exclusion
filtering rules as for base backups, as after a rewind a node enters in
recovery and a bunch of data which is copied during the rewind finishes
in the void. There has been as well many complains about the need to
remove all the time replication slot data manually after a rewind, so I
believe that this makes the user experience much easier with the tool.
Something useful is replication slot data getting filtered out.

In order to reach this point, I have been hacking the backend code and
finished with quite a bit of shuffling around how system paths are
hardcoded in many places, like pg_replslot, pg_wal, etc. Well you can
think here about all the paths hardcoded in initdb.c. So I have
introduced a couple of things:
- src/include/pg_paths.h, a new header which gathers the set of system
file and directory names. With this facility in place, the backend code
loses knowledge of hardcoded system-related paths, including things like
"base", "global", "pg_tblspc", "base/pg_control", etc. A fun
consequence of that refactoring is that it is possible to just change
pg_paths.h and have change the system paths of a PostgreSQL instance
with one-liners. For example you could change "base/" to "foo/". This
can make PostgreSQL more malleable for forks. It would be more simple
to just hardcode more the paths but I think that this would not be
manageable in the long-term, especially if similar logics spread more.
- src/include/replication/basebackup_paths.h, which extracts the exclude
rules now in basebackup.c into a header which can be consumed by both
frontends and backends. This is useful for any backup tools.
- pg_rewind update to ensure that the filters are working correctly.

So the patch set attached is made of the following:
- 0001, which refactors all hardcoded system paths into pg_paths.h.
This modifies only initdb.c and basebackup.c to ease reviews.
- 0002 spreads the path changes and the use of pg_paths.h across the
core code.
- 0003 moves the last set of definitions with backup_label,
tablespace_map and pg_internal.init.
- 0004 creates basebackup_paths.h, this can be consumed by pg_rewind.
- 0005 makes the changes for pg_rewind.

0001~0003 can be merged together, I have just done a split to ease

I am adding that to the next CF.

Attachment Content-Type Size
0001-Refactor-path-definitions-into-a-single-header-file-.patch text/plain 13.9 KB
0002-Replace-all-system-paths-hardcoded-with-data-from-pg.patch text/plain 60.5 KB
0003-Add-backup_label-pg_internal.init-and-tablespace_map.patch text/plain 3.5 KB
0004-Move-base-backup-filter-lists-into-their-own-header-.patch text/plain 7.1 KB
0005-Use-filtering-list-of-base-backups-in-pg_rewind-to-e.patch text/plain 5.1 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-05 07:14:04 Typo with pg_multixact/offset in multixact.c
Previous Message Lætitia Avrot 2018-02-05 06:00:34 Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis