Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Nikolay Samokhvalov <nik(at)postgres(dot)ai>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Date: 2023-07-07 13:31:33
Message-ID: ZKgTtaaQsUVqgblt@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Nikolay Samokhvalov (nik(at)postgres(dot)ai) wrote:
> On Fri, Jun 30, 2023 at 14:33 Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Fri, Jun 30, 2023 at 04:16:31PM -0400, Robert Haas wrote:
> > > I'm not quite clear on how Nikolay got into trouble here. I don't
> > > think I understand under exactly what conditions the procedure is
> > > reliable and under what conditions it isn't. But there is no way in
> > > heck I would ever advise anyone to use this procedure on a database
> > > they actually care about. This is a great party trick or something to
> > > show off in a lightning talk at PGCon, not something you ought to be
> > > doing with valuable data that you actually care about.
> >
> > Well, it does get used, and if we remove it perhaps we can have it on
> > our wiki and point to it from our docs.

I was never a fan of having it actually documented because it's a pretty
complex and involved process that really requires someone doing it have
a strong understanding of how PG works.

> In my case, we performed some additional writes on the primary before
> running "pg_upgrade -k" and we did it *after* we shut down all the
> standbys. So those changes were not replicated and then "rsync --size-only"
> ignored them. (By the way, that cluster has wal_log_hints=on to allow
> Patroni run pg_rewind when needed.)

That's certainly going to cause problems..

> But this can happen with anyone who follows the procedure from the docs as
> is and doesn't do any additional steps, because in step 9 "Prepare for
> standby server upgrades":
>
> 1) there is no requirement to follow specific order to shut down the nodes
> - "Streaming replication and log-shipping standby servers can remain
> running until a later step" should probably be changed to a
> requirement-like "keep them running"

Agreed that it would be good to clarify that the primary should be shut
down first, to make sure everything written by the primary has been
replicated to all of the replicas.

> 2) checking the latest checkpoint position with pg_controldata now looks
> like a thing that is good to do, but with uncertainty purpose -- it does
> not seem to be used to support any decision
> - "There will be a mismatch if old standby servers were shut down before
> the old primary or if the old standby servers are still running" should
> probably be rephrased saying that if there is mismatch, it's a big problem

Yes, it's absolutely a big problem and that's the point of the check.
Slightly surprised that we need to explicitly say "if they don't match
then you need to figure out what you did wrong and don't move forward
until you get everything shut down and with matching values", but that's
also why it isn't a great idea to try and do this without a solid
understanding of how PG works.

> So following the steps as is, if some writes on the primary are not
> replicated (due to whatever reason) before execution of pg_upgrade -k +
> rsync --size-only, then those writes are going to be silently lost on
> standbys.

Yup.

> I wonder, if we ensure that standbys are fully caught up before upgrading
> the primary, if we check the latest checkpoint positions, are we good to
> use "rsync --size-only", or there are still some concerns? It seems so to
> me, but maybe I'm missing something.

I've seen a lot of success with it.

Ultimately, when I was describing this process, it was always with the
idea that it would be performed by someone quite familiar with the
internals of PG or, ideally, could be an outline of how an interested PG
hacker could write a tool to do it. Hard to say, but I do feel like
having it documented has actually reduced the interest in writing a tool
to do it, which, if that's the case, is quite unfortunate.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Himanshu Upadhyaya 2023-07-07 14:00:10 Re: CHECK Constraint Deferrable
Previous Message Kyzer Davis (kydavis) 2023-07-07 13:31:07 RE: UUID v7