Re: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, ram(dot)maurya(at)lavainternational(dot)in, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match
Date: 2020-06-18 16:36:15
Message-ID: 20200618163615.GI7349@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 18, 2020 at 11:55:37AM -0400, Stephen Frost wrote:
> Greetings,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > On Thu, Jun 18, 2020 at 11:41:53AM -0400, Stephen Frost wrote:
> > > Sure, most options to initdb need to be the same between the old cluster
> > > and the new cluster, but this specific option doesn't have to be, since
> > > we require that it's a cleanly shut down cluster, so why are we
> > > complaining about it if it's different..?
> >
> > Did you not read my previous email that we might have added this so we
> > can upgrade replicas?
>
> I don't see how this option is related to dealing with replicas..?
>
> > I am sure I can dig out the commit that added
> > this and find the original cause, but running pg_upgrade on replicas is
> > enough of a reason to me.
>
> pg_upgrade can't be run on replicas, so I don't understand what you're
> referring to here.. The 'upgrading replicas' process that involves
> rsync also requires everything to have been shut down cleanly.

Oh, I forgot we copy the WAL from the primary and don't run initdb on
the standbys, so it might work, but looking at the pg_upgrade code, I
see:

/* now reset the wal archives in the new cluster */
prep_status("Resetting WAL archives");
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
/* use timeline 1 to match controldata and no WAL history file */
"\"%s/pg_resetwal\" -l 00000001%s \"%s\"", new_cluster.bindir,
old_cluster.controldata.nextxlogfile + 8,
new_cluster.pgdata);

So, while we don't copy over the old WAL, we do assume the WAL will be
at the same segment from the old cluster. I think this is because of
the LSN stored on user data pages. pg_resetwal doesn't seem to care
about that, so maybe pg_upgrade doesn't need to either. I don't know.

> > Yeah, we could add a flag to pg_upgrade to
> > allow this if you are not upgrading replicas, but why bother? It might
> > even work if you create the new replicas with the same WAL segment size,
> > but why add complexity for pg_upgrade, which is already complex enough.
>
> Users already have to deal with various options that need to be
> configured to match up between the primary and replicas, so this really
> seems like it's entirely independent of pg_upgrade and isn't something
> pg_upgrade needs to be worrying about..

Do you know why we require this step?

https://www.postgresql.org/docs/12/pgupgrade.html

Also, change wal_level to replica in the postgresql.conf file on
the new primary cluster.

The other modes don't work? I see this C comment:

* We unconditionally start/stop the new server because pg_resetwal -o set
* wal_level to 'minimum'. If the user is upgrading standby servers using
* the rsync instructions, they will need pg_upgrade to write its final
* WAL record showing wal_level as 'replica'.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2020-06-18 16:40:48 Re: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match
Previous Message Jeff Janes 2020-06-18 16:14:25 Re: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match