RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-23 02:59:00
Message-ID: TYAPR01MB5866DD3348B5224E0A1BFC3EF51CA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thanks for giving comments! PSA the new version.

>
======
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message and docs changes before posting (e.g. cut/paste the rendered text into some tool like MSWord or Grammarly or ChatGPT or whatever tool you like and cross-check). There are lots of small typos etc but one up-front check could avoid long cycles of reviewing/reporting/fixing/re-posting/confirming...
>

I checked all of sentences for Grammarly. Sorry for poor English.

>
======
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication slots is
different from other objects.

~

/ths/this/
>

Fixed.

>
doc/src/sgml/ref/pgupgrade.sgml

3.
+ <para>
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+ After the upgrade is complete, then re-enable the subscription. Note that
+ if the new cluser uses different port number from old one,
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+ command must be also executed on subscriber.
+ </para>

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.
>

Fixed.

>
3b.
/cluser/cluster/

~

3c.
Note that
+ if the new cluser uses different port number from old one,
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+ command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER SUBSCRIPTION ... CONNECTION command must be also executed on the subscriber.
>

The part was removed.

>
4.
+ <listitem>
+ <para>
+ <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+ of all slots on old cluster must be same as latest checkpoint location.
+ </para>
+ </listitem>

4a.
/on old cluster/on the old cluster/
>

Fixed.

>
4b.
/as latest/as the latest/
>
Fixed.

>
5.
+ <listitem>
+ <para>
+ The output plugins referenced by the slots on the old cluster must be
+ installed on the new PostgreSQL executable directory.
+ </para>
+ </listitem>

/installed on/installed in/ ??
>

"installed in" is better, fixed.

>
6.
+ <listitem>
+ <para>
+ The new cluster must have
+ <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+ configured to value larger than the existing slots on the old cluster.
+ </para>
+ </listitem>

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the old cluster.
>

Fixed.

>
src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no Assert and no checking inside this function to ensure that is the case or not. It seems strange that the *cluster is passed as an argument but then the whole function body and messages assume it can only be a new cluster anyway.

IMO it would be better to rename this function to something like check_new_cluster_logical_replication_slots() and DO NOT pass any parameter but just use the global new_cluster within the function body.
>

Hmm, I followed other functions, e.g., check_for_composite_data_type_usage() is
called only for old one but it has an argument *cluster. What is the difference
between them? Moreover, how about check_for_lost_slots() and
check_for_confirmed_flush_lsn()? Fixed for the moment.

>
8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.
>

The part was removed.

>
9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found \"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/
>

Fixed.

>
10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old cluster.
>

Per suggestion from Amit, the part was removed.

>
11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of logical replication slots (%d) on the old cluster.

11b.
I think it would be helpful for the current values to be displayed in the fatal message so the user will know more about what value to set. Notice that my above suggestion has some substitution markers.
>

Changed.

>
src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+ slot_info->slotname,
+ slot_info->plugin,
+ slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.
>

Added.

>
.../pg_upgrade/t/http://003_logical_replication_slots.pl

13.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;

13a.
It would be nicer if all the test parts have identical formats. So here it should also say

# Preparations for the subsequent test:
# 1. Create a slot on the old cluster
>

I did not use because there was only one step, but followed the style.

>
13b.
Notice the colon (:) at the end of that comment "Preparations for the subsequent test:". All the other preparation comments in this file should also have a colon.
>

Added.

>
14.
+# Cause a failure at the start of pg_upgrade because wal_level is replica

SUGGESTION
# pg_upgrade will fail because the new cluster wal_level is 'replica'
>

Fixed.

>
15.
+# 1. Create an unnecessary slot on the old cluster

(but it is not unnecessary -- it is necessary for this test!)

SUGGESTION
+# 1. Create a second slot on the old cluster
>

Fixed.

>
16.
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots

SUGGESTION
# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
>

Fixed.

>
17.
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot

SUGGESTION
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old cluster, so the new cluster config max_replication_slots=1 will now be enough.
>

Fixed.

>
18.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();

Maybe should be some added comments like:
# Check that the slot 'test_slot1' has migrated to the new cluster.
>

Added.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v24-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch application/octet-stream 7.5 KB
v24-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch application/octet-stream 23.4 KB
v24-0003-pg_upgrade-Add-check-function-for-logical-replic.patch application/octet-stream 12.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-08-23 03:00:22 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Hayato Kuroda (Fujitsu) 2023-08-23 02:58:23 RE: [PoC] pg_upgrade: allow to upgrade publisher node