Re: WIP: Restricting pg_rewind to data/wal dirs

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 09:43:13
Message-ID: CAN-RpxBvGPC9UjPe-=_Vq3p6qK7yj5ajT2RgZHtGcFKQjvQzwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First, thanks for your thoughts on this, and I am interested in probing
them more.

On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
> > The Solution:
> > The solution is a whitelist of directories specified which are the only
> ones
> > which are synchronised. The relevant part of this patch is:
> >
> > +/* List of directories to synchronize:
> > + * base data dirs (and ablespaces)
> > + * wal/transaction data
> > + * and that is it.
> > + *
> > + * This array is null-terminated to make
> > + * it easy to expand
> > + */
> >
> > +const char *rewind_dirs[] = {
> > + "base",
> > + "global",
> > + "pg_commit_ts",
> > + "pg_logical",
> > + "pg_multixact",
> > + "pg_serial",
> > + "pg_subtrans",
> > + "pg_tblspc",
> > + "pg_twophase",
> > + "pg_wal",
> > + "pg_xact",
> > + NULL
> > +};
>
> The problem with a white list, which is one reason why I do not favour
> it, is in the case where a feature adds in the data folder a new path
> for its data, particularly in the case where this is critical for a
> base backup. If this folder is not added in this list, then a rewind
> may be silently corrupted as well. There are also a set of directories
> in this list that we do not care about, pg_serial being one.
> pg_subtrans is a second, as it gets zeroed on startup.
>

The problem with an exclude list is that we cannot safely exclude
configuration files or logs (because these could be named anything), right?

Are there any cases right now where you have features added by extensions
that write to directories which are required for a rewind? I am asking
because I would like to see if this is the minimum working change or if
this change is fundamentally broken currently and must be extended to allow
custom paths to be sync'd as well.

>
> And if you think about it, pg_rewind is actually the *same* processing
> as a base backup taken using the replication protocol. So instead of
> this patch I would recommend using excludeFiles and excludeDirContents
> by moving this list to a common header where frontend and backend can
> refer to. Note that basebackup.h includes replnodes.h, so my thoughts
> is that you should split the current header with something like
> basebackup_internal.h which is backend-only, and have pg_rewind fetch
> the list of directories to automatically ignore as those ones. You can
> also simplify a bit the code of pg_rewind a bit so as things like
> postmaster.opts. On top of that, there is visibly no need to tweak the
> SQL query fetching the directory list (which is useful for debugging
> as shaped now actually), but just change process_source_file so as its
> content is not added in the file map.
>

I am not sure it *should* be the same, however. In a backup we probably
want to backup the postgresql.auto.conf, but on a failover, I don't think
we want to clobber configuration. We certainly do not want to sometimes
clobber configuration but not other times (which is what happens right now
in some cases). And under no circumstances do we want to clobber logs on a
failed server with logs on a working server. That's asking for serious
problems in my view.

If you think about it, there's a huge difference in use case in backing up
a database cluster (Including replication slots, configs in the dir etc)
and re-syncing the data so that replication can resume, and I think there
are some dangers that come up when assuming these should be closely tied
together.

>
> Then there is a second case where you do not want a set of folders to
> be included, but those can be useful by default, an example here being
> pg_wal where one might want to have an empty path to begin with. A
> third case is a set of folders that you do not care about, but depends
> on the deployment, being here "log" for example. For those you could
> just add an --exclude-path option which simply piles up paths into a
> linked list when called multiple times. This could happen with a
> second patch.
>

Agreed. And one could add an "--include-path" option to allow for unusual
cases where you want extra directories, such as replication slots, or the
like.

I think another patch could also specifically empty and perhaps create a
replication slot allowing for one to bring tings back up via streaming
replication more safely.

>
> > From there we iterate over this array for directory-based approaches in
> > copy_fetch.c and with one query per directory in libpq_fetch.c. This
> also
> > means shifting from the basic interface from PQexec to PQexecParams. It
> > would be possible to move to binary formats too, but this was not done
> > currently in order to simplify code review (that could be a separate
> > independent patch at a later time).
>
> - res = PQexec(conn, sql);
> + for (p = 0; rewind_dirs[p] != NULL; ++p)
> + {
> + const char *paths[1];
> + paths[0] = rewind_dirs[p];
> + res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0);
> Calling multiple times the query to list all directories is messy IMO.
> And this is N-costly processing if there are many files to look at,
> say many relation files.
>

We wouldn't recurse into the relation files dir more than once since this
is filtered out on the initial query in the recursive CTE before
recursion. In other words, the query filters out the top-level query
before it recurses into any directory so there is a smalll cost (planning,
the fact that the root of the pgdata is queried once per) but this seems
pretty minor given that we need at least one query per file that we sync
anyway.

I am open to changing it to an array, but one thing to think about is that
if we want to add an ability to add extra directories, this makes it much
easier to do that.

> --
> Michael
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-30 09:44:49 Re: An unlikely() experiment
Previous Message Alvaro Herrera 2017-10-30 09:42:53 Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM