From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Date: | 2023-08-21 07:16:46 |
Message-ID: | CAHut+Pt7Ri1r2Q7BQ3qwtGtDYFjvBOU8tUOUF-E41-jjxYSFTA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are some review comments for v22-0002
======
Commit Message
1.
This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.
~
1a
/is newly extracted/is fetched/
~
1b.
/using the pg_create_logical_replication_slots()/executing
pg_create_logical_replication_slots()/
~
1c.
/clushter/cluster/
~~~
2.
Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.
~
2a.
/it must/slot restoration/
~
2b.
/the restriction/this restriction/
======
doc/src/sgml/ref/pgupgrade.sgml
3.
+ <para>
+ <application>pg_upgrade</application> attempts to migrate logical
+ replication slots. This helps avoid the need for manually defining the
+ same replication slot on the new publisher.
+ </para>
/same replication slot/same replication slots/
~~~
4.
+ <para>
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+ After the upgrade is complete, execute the
+ <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+ connection string, and then re-enable the subscription.
+ </para>
On the rendered page, it looks a bit strange that DISABLE has a link
but COMMENTION does not have a link.
~~~
5.
+ <para>
+ There are some prerequisites for <application>pg_upgrade</application> to
+ be able to upgrade the replication slots. If these are not met an error
+ will be reported.
+ </para>
+
+ <itemizedlist>
+1 to use all the itemizedlist changes that Amit suggested [1] in his
attachment.
======
src/bin/pg_upgrade/check.c
6.
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
IMO the arg name should not shadow a global with the same name. See
other review comment for this function signature.
~~~
7.
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos(&old_cluster, live_check);
But 'live_check' is never used?
~~~
8. check_for_logical_replication_slots
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
IMO the arg name should not shadow a global with the same name. If
this is never going to be called with any param other than
&new_cluster then probably it is better not even to pass have that
argument at all. Just refer to the global new_cluster inside the
function.
You can't say that 'check_for_new_tablespace_dir' does it already so
it must be OK -- I think that the existing function has the same issue
and it also ought to be fixed to avoid shadowing!
~~~
9. check_for_logical_replication_slots
+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
IMO the code matches the comment better if you say < 1700 instead of <= 1600.
======
src/bin/pg_upgrade/function.c
10. get_loadable_libraries
/*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions or referred
+ * by logical replication slots in this DB.
*/
ress[dbnum] = executeQueryOrDie(conn,
~
/referred by/referred to by/
======
src/bin/pg_upgrade/info.c
11.
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ get_logical_slot_infos_per_db(cluster, pDbInfo);
+ slot_count += pDbInfo->slot_arr.nslots;
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
+
11a.
Now the variable 'slot_count' is no longer being returned so it seems redundant.
~
11b.
What is the 'live_check' parameter for? Nobody is using it.
~~~
12. count_logical_slots
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slotnum;
+}
IMO this variable should be called something like 'slot_count'. This
is the same review comment also made in a previous review. (See [2]
comment#12).
~~~
13. print_slot_infos
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+ slot_arr->slots[slotnum].slotname,
+ slot_arr->slots[slotnum].plugin,
+ slot_arr->slots[slotnum].two_phase);
+}
It might be nicer to introduce a variable, instead of all those array
dereferences:
LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
~~~
14.
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ /*
+ * Constructs query for creating logical replication slots.
+ *
+ * XXX: For simplification, pg_create_logical_replication_slot() is
+ * used. Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
+ conn);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
+ conn);
+ appendPQExpBuffer(query, ", false, %s);",
+ slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+ PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+ resetPQExpBuffer(query);
+ }
+
+ PQfinish(conn);
+
+ destroyPQExpBuffer(query);
+ }
+
+ end_progress_output();
+ check_ok();
14a
Similar to the previous comment (#13). It might be nicer to introduce
a variable, instead of all those array dereferences:
LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
~
14b.
It was not clear to me why this command is not being built using
executeQueryOrDie directly instead of using the query buffer. Is there
some reason?
~
14c.
I think it would be cleaner to have a separate res variable like you
used elsewhere:
res = executeQueryOrDie(...)
instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
in one line
======
src/bin/pg_upgrade/pg_upgrade.
15.
+void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);
I didn't see a reason for that 'live_check' parameter.
======
.../pg_upgrade/t/003_logical_replication_slots.pl
16.
IMO this would be much easier to read if there were BIG comments
between the actual TEST parts
For example
# ------------------------------
# TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
<preparation>
<test>
<cleanup>
# ------------------------------
# TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
<preparation>
<test>
<cleanup>
# ------------------------------
# TEST: Successful upgrade
<preparation>
<test>
<cleanup>
~~~
17.
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ not removed after pg_upgrade failure");
The message is ambiguous
BEFORE
'run of pg_upgrade of old node with wrong wal_level'
SUGGESTION
'run of pg_upgrade where the new node has the wrong wal_level'
~~~
18.
+# Create an unnecessary slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
+]);
+
+$old_publisher->stop;
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# smaller than existing slots on old node
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
IMO the comment is misleading. It is not an "unnecessary slot", it is
just a 2nd slot. And this is all part of the preparation for the next
test so it should be under the other comment.
For example SUGGESTION changes like this:
# Preparations for the subsequent test.
# 1. Create an unnecessary slot on the old node
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
]);
$old_publisher->stop;
# 2. max_replication_slots is set to smaller than the number of slots
(2) present on the old node
$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
# 3. new node wal_level is set correctly
$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
~~~
19.
+# Remove an unnecessary slot and consume WAL records
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_drop_replication_slot('test_slot2');
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)
+]);
+$old_publisher->stop;
+
This comment should say more like:
# Preparations for the subsequent test.
~~~
20.
+# Actual run, pg_upgrade_output.d is removed at the end
This comment should mention that "successful upgrade is expected"
because all the other prerequisites are now satisfied.
~~~
21.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new node');
Should there be a matching new_pulisher->stop;?
------
[1] https://www.postgresql.org/message-id/CAA4eK1%2BdT2g8gmerguNd_TA%3DXMnm00nLzuEJ_Sddw6Pj-bvKVQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/TYAPR01MB586604802ABE42E11866762FF51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-08-21 07:24:47 | Re: Remove distprep |
Previous Message | Gurjeet Singh | 2023-08-21 06:32:30 | Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns |