Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-11 05:09:03
Message-ID: CAFiTN-t3==-vShbdomFLq6af9rmgP7AAt49P9apeN_a+aTC3ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:

Comments on the latest patch.

1.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

This comment in the commit message is confusing. I understand the
reason but from this, it is not very clear that if resetwal removes
the WAL we needed then why it is good to create after the resetwal. I
think we should make it clear that creating new slot will set the
restart lsn to current WAL location and after that resetwal can remove
those WAL where slot restart lsn is pointing....

2.

+ <itemizedlist>
+ <listitem>
+ <para>
+ All slots on the old cluster must be usable, i.e., there are no slots
+ whose
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+ is <literal>lost</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+ of all slots on the old cluster must be the same as the latest
+ checkpoint location.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The output plugins referenced by the slots on the old cluster must be
+ installed in the new PostgreSQL executable directory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The new cluster must have
+ <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+ configured to a value greater than or equal to the number of slots
+ present in the old cluster.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The new cluster must have
+ <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+ <literal>logical</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>

I think we should also add that the new slot should not have any
permanent existing logical replication slot.

3.
- with the primary.) Replication slots are not copied and must
- be recreated.
+ with the primary.) Replication slots on the old standby are not copied.
+ Only logical slots on the primary are migrated to the new standby,
+ and other slots must be recreated.

This paragraph should be rephrased. I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way. Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

Rephrase the first line as -> Raise an ERROR if the logical
replication slot is invalidating during an upgrade.

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;

For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.

6.
+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+ "New cluster must not have logical replication slots but found %d slots.",
+ nslots_on_new),
+ nslots_on_new);
...
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);

I have noticed that the case of the first letter in the pg_fatal
message is not consistent.

7.
+
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {

Why comment says "Is the slot still usable?" I think it should be "Is
the slot usable?" otherwise it appears that we have first fetched the
slots and now we are refetching it and checking whether it is still
usable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-09-11 05:26:16 Re: Cleaning up array_in()
Previous Message Michael Paquier 2023-09-11 05:02:38 Re: pg_logical_emit_message() misses a XLogFlush()