Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andreas Joseph Krogh <andreas(at)visena(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)
Date: 2017-09-14 19:12:50
Message-ID: 20170914191250.GB6595@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 13, 2017 at 12:16:33PM -0400, Stephen Frost wrote:
> Bruce,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > I have applied the attached patch to show examples of using rsync on
> > PGDATA and tablespaces, documented that rsync is only useful when in
> > link mode, and explained more clearly how rsync handles links. You can
> > see the results here:
> >
> > http://momjian.us/pgsql_docs/pgupgrade.html
> >
> > Any more improvements?
>
> First off, I'd strongly suggest that we make "Step 1" in the pg_upgrade
> process be "take a full backup and verify that you're able to restore it
> successfully and without corruption."

I am hesitant to add pg_upgrade-specific nanny language but if we want
to review all upgrade methods and make recommendations, we can do that.
If we need to add more --link-specific warnings, please suggest that.
Thanks.

> I don't particularly care for how this seems to imply that the Rsync
> method is "the" method to use when --link mode is used with pg_upgrade.

Agreed. I have added new text in the attached patch to make it clear
that non-rsync is an option and is easier.

> I'd reword the section title to be along these lines:
>
> If you have streaming replicas or log-shipping standby servers then they
> will also need to be updated. The simplest way to accomplish this is to
> simply rebuild the replicas from scratch once the primary is back
> online. Unfortunately, that can take a long time for larger systems as
> the data has to be copied from the primary to each replica in the
> environment. If --link mode was used with pg_upgrade, the Latest
> checkpoint location matches between the primary and the replica(s) (as
> discussed in Step 8), the rsync utility is available, and the existing
> data directory and new data directory on the replica are able to be in a
> common directory on the same filesystem (as is required on the primary
> for --link mode to be used), then an alternative method may be used to
> update the replicas using rsync which will generally require much less
> time.
>
> Note that this method will end up needlessly copying across temporary
> files and unlogged tables. If these make up a large portion of your
> database, then rebuilding the replicas from scratch may be a better
> option.
>
> With this method, you will not be running pg_upgrade on the standby
> servers, but rather rsync on the primary to sync the replicas to match
> the results of the pg_upgrade on the primary. Do not start any servers
> yet. If you did not use link mode, skip the instructions in this
> section and simply recreate the standby servers.
>
> This method requires that the *old* data directory on the replica be in
> place as rsync will be creating a hard-link tree between the old data
> files on the replica and the new data directory on the replica (as was
> done by pg_upgrade on the primary).

Sorry, I didn't use any of the above text. It seems to be a step
backward in clarity.

> a. Install the new PostgreSQL binaries on standby servers.
>
> ...
>
> b. Make sure the new standby data directories do not exist
>
> If initdb was run on the replica to create a new data directory, remove
> that new data directory (the rsync will recreate it). Do *not* remove
> the existing old data directory.

I clarified "new data directory" in the patch.

> c. Install custom shared object files
>
> ** I would probably move this up to be step 'b' instead, and make step
> 'b' be step 'c' instead.

Why move it? The current ordering seems more logical.

> d. Stop standby servers
>
> ...
>
> *new*
> e. Verify/re-verify that Latest checkpoint location in pg_controldata
> on the replica matches that of the primary (from before the primary
> was upgraded with pg_upgrade).

I added text in the pg_controldata paragraph to mention which standby
upgrade method is references. Repeating the pg_controldata check seems
pointless here.

> f. Save configuration files
>
> ** this should have a caveat that it's only necessary if the config
> files are in the data directory.

I clarified "data directory" in the patch.

> g. Run rsync
>
> ** I am having a hard time figuring out why --delete makes sense here.
> There shouldn't be anything in the new data directory, and we don't
> actually need to delete anything in the old data directory on the
> replica, so what are we doing suggesting --delete be used? Strikes me
> as unnecessairly adding risk, should someone end up doing the wrong
> command. Also, again, if I was doing this, I'd absolutely run rsync
> with --dry-run for starters and review what it is going to do and make
> sure that's consistent with what I'd expect.

I talked with Stephen about this on IM. The issue is that if you don't
do --delete, and there are files in the primary that are not in the
standby, they are copied, but files in the standby and not in the
primary are kept. This could lead to mixed primary/standby log files,
or worse. Using --delete means the new standby exactly matches the new
primary and all the steps you need to adjust after a base backup are the
same.

I added a mention of rsync --dry-run per Stephen's suggestion.

I have also added a paragraph from Magnus that I developed via IM that
explains that you can use rsync to upgrade one standby from another
standby, if the standby has not been started.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

Attachment Content-Type Size
diff text/x-diff 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-09-14 19:13:56 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)
Previous Message Alexander Korotkov 2017-09-14 19:10:49 Re: Pluggable storage