Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using base backup exclusion filters to reduce data transferred with pg_rewind
Date: 2018-03-27 01:55:44
Message-ID: 20180327015544.GA1172@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
> +1. It's better for us to focus on the code change of the fillter on pg_rewind
> rather than such "refactoring".

(filter takes one 'l', not two)

Okay. I had my mind mostly focused on how to shape the exclusion list
and get it shared between the base backup and pg_rewind, so let's move
on with the duplicated list for now. I did not put much efforts into
the pg_rewind portion to be honest.

> As I told upthread, the previous patch has the
> problem where the files which should be skipped are not skipped. ISTM that,
> in pg_rewind, the filter should be triggered in recurse_dir() not
> process_source_file().

If you put that into recurse_dir you completely ignore the case where
changes are fetched by libpq. Doing the filtering when processing the
file map has the advantage to take care of both the local and remote
cases, which is why I am doing it there. So you would just get half of
the cake and not the whole of it.

> BTW what should pg_rewind do when it finds the directory which should be
> skipped, in the source directory? In your patch, pg_rewind just tries to skip
> that directory at all. But isn't this right behavior? If that directory doesn't
> exist in the target directory (though I'm not sure if this situation is really
> possible), I'm thinking that pg_rewind should create that "empty" directory
> in the target. No?

I am not exactly sure what you are coming up with here. The target
server should have the same basic directory mapping as the source as the
target has been initialized normally with initdb or a base backup from
another node, so checking for the *contents* of directories is enough
and keeps the code more simple, as the exclude filter entries are based
on elements inherent to PostgreSQL internals. Please note as well that
if a non-system directory is present on the source but not the target
then it would get created on the target.

At the end I have finished with the attached. I have taken the decision
to not include as well xlog.h in pg_rewind to avoid having to drag a lot
of backend-only headers like pg_resetwal does, which I prefer avoid as
that's only hardcoding values for "backup_label" and "tablespace_map".
This applies filters based on directory contents, so by running the
regression tests you can see entries like the following ones:
entry "postmaster.opts" excluded from source file list
entry "pg_subtrans/0000" excluded from source file list
entry "pg_notify/0000" excluded from source file list
entry "base/12360/pg_internal.init" excluded from source file list
entry "backup_label.old" excluded from source file list
entry "global/pg_internal.init" excluded from source file list
entry "postmaster.opts" excluded from target file list
entry "pg_subtrans/0000" excluded from target file list
entry "pg_notify/0000" excluded from target file list
entry "base/12360/pg_internal.init" excluded from target file list
entry "global/pg_internal.init" excluded from target file list

Processing the filemap list on the target also matters in my opinion.
When at recovery, all the previous files will be wiped out, and we
should not remove either things like postmaster.pid as those are around
to prevent corruption problems.

Thanks,
--
Michael

Attachment Content-Type Size
0001-Add-exclude-list-similar-to-base-backups-in-pg_rewin.patch text/x-diff 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2018-03-27 02:04:33 Re: GSOC 2018 Proposal review
Previous Message David Rowley 2018-03-27 00:45:29 Re: Parallel Aggregates for string_agg and array_agg