Re: Allow replication roles to use file access functions

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow replication roles to use file access functions
Date: 2015-09-08 21:22:54
Message-ID: 20150908212254.GU3685@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki,

* Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
> On 09/04/2015 08:14 AM, Michael Paquier wrote:
> >Of course, that's not mandatory to fetch them. It is as well not worth
> >the complication to apply a filter to not fetch a portion of the
> >files, and I think that's why Heikki took the approach to fetch
> >everything in PGDATA (except relation files) because that was just
> >more simple to implement as such for little gain.
>
> It's also simpler to explain and reason about. The current behaviour
> of pg_rewind is that the end-result is basically the same as
> completely copying over the data directory. If we start to add
> smarts on what to copy and what not, it gets a lot more complicated.
> Which configuration files to copy, and which not? (Think
> postgresql.auto.conf...)

I've always felt we had a well defined set of things which PG is allowed
and expected to modify vs. what it shouldn't be messing with. Namely,
config files (pg_hba.conf, postgresql.conf, etc) are not things which
are modified by PostgreSQL, and is why they often live outside of
$PGDATA, vs. clog, xlog, the heap, postgresql.auto.conf, etc, which are
all very clearly under PG's control.

> If you want to preserve a file, you can copy it elsewhere first, and
> copy it back after running pg_rewind. Just as you would with "cp" or
> "rsync" (well, with rsync I guess you could pass a command-line
> switch to ignore some files). That might not be perfect, but it's a
> problem you'll have to deal with if you're not using pg_rewind
> anyway.

That's only true if the configs exist in $PGDATA, which they often
don't. What about things like pg_log? Does pg_rewind copy every log
file from the new-master back to the old-master? That strikes me as
useless at best and a terrible idea at worst since it's likely to blow
away old log files.

I had expected pg_rewind to concern itself with exactly what is under
PG's perview to mess with. That includes postgresql.auto.conf, but not
pg_log or postgresql.conf.

Perhaps it's too late to change that but I'm not thrilled with it.

As for this discussion, if we're going to make pg_rewind a magic rsync,
I'd still prefer for it to work through the replication protocol instead
of directly giving users who have the 'replication' attribute access to
call the SQL-level functions for opening/reading files on the
filesystem. If that's objectionable, then I'd suggest we come up with a
new/different way of giving access to those functions instead and tell
users to use that for their pg_rewind user, but I do think we'll need
replication protocol capabilities along those lines since it might very
well be simpler to work with (what happens if there's a >1G file?) and
we might want that for parallel pg_basebackup anyway.

One thought that I just had would be to have a default 'pg_rewind' role
which has exactly the access needed for pg_rewind to do its job, which
would simplify things for our users, I'd think.

I'll add that to the proposed set of roles in the default roles patch.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-09-08 21:31:33 Re: pgsql: Improve logging of TAP tests.
Previous Message Andrew Dunstan 2015-09-08 21:19:52 Re: pgsql: Improve logging of TAP tests.