Re: upgrading from pg 9.3 to 10

From: bricklen <bricklen(at)gmail(dot)com>
To: sfrost(at)snowman(dot)net
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: upgrading from pg 9.3 to 10
Date: 2018-08-23 00:59:40
Message-ID: CAGrpgQ-fe5XAP0i-AkZEEnrNYv2m4fGEu7PVAAzKbarXnhgRhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Stephen, thanks for the feedback, it is much appreciated!

On Mon, Aug 20, 2018 at 12:15 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * bricklen (bricklen(at)gmail(dot)com) wrote:
> > Using pg_upgrade, it is definitely possible to upgrade 9.3 to 10 in one
> > jump. We did this in production earlier this year for 1500 Postgres
> > clusters.
> > At https://bricklen.github.io/2018-03-27-Postgres-10-upgrade/ I
> documented
> > (in excrutiating detail, perhaps) how we upgraded those clusters from 9.3
> > to 10 with sub-15 minute downtime per cluster. The only real issues we
> ran
> > into were some corrupted indexes that appeared to be related to 10.1 and
> > 10.2. Upgrading to 10.3/10.4 and reindexing fixed those particular
> > corrupted indexes.
>
> Interesting write-up. A few points:
>
> #1: I'd strongly recommend including something in the write-up about
> checking for unlogged tables. Those can add quite a bit of time to the
> rsync if they aren't truncated before the primary cluster is shut down.
>

A note about UNLOGGED has been added to the page, thanks.

> #2: The issue with timelines leads me to suspect that you had a
> restore_command defined and that when PG started up, it found the
> timeline history files from the old cluster. If you don't have a
> restore_command set for any of these then I'm very curious what
> happened. The other possibility (though I wouldn't have expected a
> timeline complaint from PG...) is that the replica wasn't fully up to
> date for whatever reason.
>

Entirely plausible, unfortunately I don't recall what all the settings were
in the replicas, nor what the actual errors were. One puzzling aspect was
that every one of those 1500 clusters had the same primary and replica
settings, and every one of them had been failed back and forth dozens of
times before we upgraded to PG10.
The replica not being fully up to date makes more sense than rsync copying
over history files only part of the time, since we were less focused on the
remote replica than ensuring the primary was successfully upgraded.

> #3: There's a number of checks discussed in the upgrade documentation
> around using the rsync-based method, but it doesn't seem like you did
> those. A mention of them might be good.

If you mean #8 from the docs, where it says "If you are upgrading standby
servers using methods outlined in section Step 10, verify that the old
standby servers are caught up by running pg_controldata against the old
primary and standby clusters. Verify that the “Latest checkpoint location”
values match in all clusters. (There will be a mismatch if old standby
servers were shut down before the old primary.) ", we found that the
pg_controldata output from the hot standby never matched the output from
the primary. When I was investigating at the time, I read a note somewhere
that if the replica was not using Streaming Replication the pg_controldata
output was unlikely to match. It's possible I misunderstood (or I've
forgotten what I read), but the output did not match in the dozen-ish times
we tested that particular aspect. It's entirely possible it was due to me
bungling something, but I was careful to checkpoint + switch wal files +
shut down the primary cleanly before checking that all the WALs from the
primary were shipped to the hot standby and were applied successfully,
before gracefully shutting down the hot standby. In all that testing, the
systems were quiescent, as we'd already blocked access to the primary.
If you meant other checks, I'd be happy to add a note about them if you can
point out which ones they were. It's possible we overlooked something in
our planning and execution of the pg_upgrade steps.

> Note that these are
> particularly important because the rsync will *not* copy over changes to
> the data files except in the relatively rare case of the relfilenode's
> size changing (given that most of them are 1G, that's not too likely).
> The note you have about the rsync taking more time due to "if the
> remote replica was fully caught up when it was shut down" isn't
> accurate- there is no WAL replay that happens on the replica using this
> method to 'catch up' and if WAL replay was required to have this process
> be correct then it simply wouldn't ever work.
>

No doubt the problem exists between my ears and I'm making invalid
assumptions about the problems we ran into.
What I meant say -and please correct me if this still inaccurate- was that
once the primary was stopped cleanly, all WALs generated needed to be
shipped to the remote replica, where those WALs needed to be applied to
bring the replica to a point where it is near-identical the primary. At
that point, the rsync would only ship the changes created by the pg_upgrade
process itself. The point about being "caught up" was the if the replica
hadn't applied all the WALs from the primary, it would have more data files
different to the primary, thus resulting in more data to rsync. I'm happy
to be corrected if that's wrong.

> #4: pg_upgrade absolutely resets the timeline to '1', and you shouldn't
> ever copy over history files from the old cluster to the new cluster.
> The new replicas will *also* be on timeline '1'.
>

There was no explicit copying over of a history file, but there was nothing
preventing that from happening either, so it's probable the rsync did
indeed copy it.

> #5: There's no such thing as a 'timeline decrement'. The new cluster
> (either on the primary or the replica) should only ever see itself as
> being on timeline '1' when starting up after the pg_upgrade and before a
> promotion happens.
>

Poor choice of words. pg_upgrade resets the timeline to 1, but the replica
in several cases (after the rsync completed) failed to start because it was
looking for a higher timeline. That leads me to believe the history file
you mentioned above should have been excluded from the replica.

> #6: In note 33, it's unclear what this is referring to. There's no WAL
> which would have been generated by the pg_upgrade (that process is not
> WAL'd). Perhaps some activity needed to be done on the primary before a
> new restorepoint would happen on the replica, due to how pg_basebackup
> needs a restorepoint to begin from when working on a replica.
>

By step #33, both the active primary and the main replica have been started
and are accepting client traffic. This step is for creating the hot
standby at the end of the line ("end of the line" as shown in the image at
the top of the page), so the WAL isn't from pg_upgrade, but from normal
user changes.

> Thanks!
>
> Stephen
>

Thanks a lot for reviewing my write-up, and for your explanations. Any
corrections are welcome, and all errors in that document are mine that I'd
be pleased to fix.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Grégoire Berclaz 2018-08-23 06:57:21 postgresql-10 armhf repository
Previous Message Tim Cross 2018-08-22 22:31:57 Re: Can Pg somehow recognize/honor linux groups to control user access ?