From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Peter Smith' <smithpb2250(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date: | 2023-08-21 13:04:58 |
Message-ID: | TYCPR01MB5870AC61339753C7E3F0F1B0F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Peter,
Thank you for reviewing! The patch can be available in [1].
> Commit Message
>
> 1.
> This commit allows nodes with logical replication slots to be upgraded. While
> reading information from the old cluster, a list of logical replication slots is
> newly extracted. At the later part of upgrading, pg_upgrade revisits the list
> and restores slots by using the pg_create_logical_replication_slots() on the new
> clushter.
>
> ~
>
> 1a
> /is newly extracted/is fetched/
Fixed.
> 1b.
> /using the pg_create_logical_replication_slots()/executing
> pg_create_logical_replication_slots()/
Fixed.
> 1c.
> /clushter/cluster/
Fixed.
> 2.
> Note that it must be done after the final pg_resetwal command during the upgrade
> because pg_resetwal will remove WALs that are required by the slots. Due to the
> restriction, the timing of restoring replication slots is different from other
> objects.
>
> ~
>
> 2a.
> /it must/slot restoration/
You meant to say s/it must/slot restoration must/, right? Fixed.
> 2b.
> /the restriction/this restriction/
>
> ======
> doc/src/sgml/ref/pgupgrade.sgml
>
> 3.
> + <para>
> + <application>pg_upgrade</application> attempts to migrate logical
> + replication slots. This helps avoid the need for manually defining the
> + same replication slot on the new publisher.
> + </para>
>
> /same replication slot/same replication slots/
Fixed.
> 4.
> + <para>
> + Before you start upgrading the publisher cluster, ensure that the
> + subscription is temporarily disabled, by executing
> + <link linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION ... DISABLE</command></link>.
> + After the upgrade is complete, execute the
> + <command>ALTER SUBSCRIPTION ... CONNECTION</command>
> command to update the
> + connection string, and then re-enable the subscription.
> + </para>
>
> On the rendered page, it looks a bit strange that DISABLE has a link
> but COMMENTION does not have a link.
Added.
> 5.
> + <para>
> + There are some prerequisites for
> <application>pg_upgrade</application> to
> + be able to upgrade the replication slots. If these are not met an error
> + will be reported.
> + </para>
> +
> + <itemizedlist>
>
> +1 to use all the itemizedlist changes that Amit suggested [1] in his
> attachment.
Yeah, I agreed it is nice. Applied.
> src/bin/pg_upgrade/check.c
>
> 6.
> +static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
>
> IMO the arg name should not shadow a global with the same name. See
> other review comment for this function signature.
OK, fixed.
> 7.
> + /* Extract a list of logical replication slots */
> + get_logical_slot_infos(&old_cluster, live_check);
>
> But 'live_check' is never used?
It is needed for 0003, moved.
> 8. check_for_logical_replication_slots
> +
> +/*
> + * Verify the parameter settings necessary for creating logical replication
> + * slots.
> + */
> +static void
> +check_for_logical_replication_slots(ClusterInfo *new_cluster)
>
> IMO the arg name should not shadow a global with the same name. If
> this is never going to be called with any param other than
> &new_cluster then probably it is better not even to pass have that
> argument at all. Just refer to the global new_cluster inside the
> function.
>
> You can't say that 'check_for_new_tablespace_dir' does it already so
> it must be OK -- I think that the existing function has the same issue
> and it also ought to be fixed to avoid shadowing!
Fixed.
> 9. check_for_logical_replication_slots
>
> + /* logical replication slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> + return;
>
> IMO the code matches the comment better if you say < 1700 instead of <= 1600.
Changed.
> src/bin/pg_upgrade/function.c
>
> 10. get_loadable_libraries
> /*
> - * Fetch all libraries containing non-built-in C functions in this DB.
> + * Fetch all libraries containing non-built-in C functions or referred
> + * by logical replication slots in this DB.
> */
> ress[dbnum] = executeQueryOrDie(conn,
> ~
>
> /referred by/referred to by/
Fixed.
> src/bin/pg_upgrade/info.c
>
> 11.
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + if (cluster == &old_cluster)
> + pg_log(PG_VERBOSE, "\nsource databases:");
> + else
> + pg_log(PG_VERBOSE, "\ntarget databases:");
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
> + DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
> +
> + get_logical_slot_infos_per_db(cluster, pDbInfo);
> + slot_count += pDbInfo->slot_arr.nslots;
> +
> + if (log_opts.verbose)
> + {
> + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
> + print_slot_infos(&pDbInfo->slot_arr);
> + }
> + }
> +}
> +
>
> 11a.
> Now the variable 'slot_count' is no longer being returned so it seems redundant.
>
> ~
>
> 11b.
> What is the 'live_check' parameter for? Nobody is using it.
These are needed for 0003, moved.
> 12. count_logical_slots
>
> +int
> +count_logical_slots(ClusterInfo *cluster)
> +{
> + int dbnum;
> + int slotnum = 0;
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
> +
> + return slotnum;
> +}
>
> IMO this variable should be called something like 'slot_count'. This
> is the same review comment also made in a previous review. (See [2]
> comment#12).
Changed.
> 13. print_slot_infos
>
> +
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> + int slotnum;
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
> + slot_arr->slots[slotnum].slotname,
> + slot_arr->slots[slotnum].plugin,
> + slot_arr->slots[slotnum].two_phase);
> +}
>
> It might be nicer to introduce a variable, instead of all those array
> dereferences:
>
> LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
Changed.
> 14.
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + {
> + /*
> + * Constructs query for creating logical replication slots.
> + *
> + * XXX: For simplification, pg_create_logical_replication_slot() is
> + * used. Is it sufficient?
> + */
> + appendPQExpBuffer(query, "SELECT
> pg_catalog.pg_create_logical_replication_slot(");
> + appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
> + conn);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
> + conn);
> + appendPQExpBuffer(query, ", false, %s);",
> + slot_arr->slots[slotnum].two_phase ? "true" : "false");
> +
> + PQclear(executeQueryOrDie(conn, "%s", query->data));
> +
> + resetPQExpBuffer(query);
> + }
> +
> + PQfinish(conn);
> +
> + destroyPQExpBuffer(query);
> + }
> +
> + end_progress_output();
> + check_ok();
>
> 14a
> Similar to the previous comment (#13). It might be nicer to introduce
> a variable, instead of all those array dereferences:
>
> LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
Changed.
> 14b.
> It was not clear to me why this command is not being built using
> executeQueryOrDie directly instead of using the query buffer. Is there
> some reason?
I wanted to take care the encoding, that was the reason I used PQExpBuffer
functions, especially appendStringLiteralConn(). IIUC executeQueryOrDie() could
not take care it.
> 14c.
> I think it would be cleaner to have a separate res variable like you
> used elsewhere:
> res = executeQueryOrDie(...)
>
> instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
> in one line
Hmm, there are some use cases for PQclear(executeQueryOrDie(...)) style, e.g.,
set_locale_and_encoding() and set_frozenxids(). I do not think your style is good
if the result of the query is not used. please tell me if you find a case that
res = executeQueryOrDie(...) is used but result is not checked.
> src/bin/pg_upgrade/pg_upgrade.
>
> 15.
> +void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);
>
> I didn't see a reason for that 'live_check' parameter.
It was needed for 0003, moved.
> .../pg_upgrade/t/003_logical_replication_slots.pl
>
> 16.
> IMO this would be much easier to read if there were BIG comments
> between the actual TEST parts
>
> For example
>
> # ------------------------------
> # TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
> <preparation>
> <test>
> <cleanup>
>
> # ------------------------------
> # TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
> <preparation>
> <test>
> <cleanup>
>
> # ------------------------------
> # TEST: Successful upgrade
> <preparation>
> <test>
> <cleanup>
Added. 0003 also followed the style.
> 17.
> +# Cause a failure at the start of pg_upgrade because wal_level is replica
> +command_fails(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d', $old_publisher->data_dir,
> + '-D', $new_publisher->data_dir,
> + '-b', $bindir,
> + '-B', $bindir,
> + '-s', $new_publisher->host,
> + '-p', $old_publisher->port,
> + '-P', $new_publisher->port,
> + $mode,
> + ],
> + 'run of pg_upgrade of old node with wrong wal_level');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> + "pg_upgrade_output.d/ not removed after pg_upgrade failure");
>
> The message is ambiguous
>
> BEFORE
> 'run of pg_upgrade of old node with wrong wal_level'
>
> SUGGESTION
> 'run of pg_upgrade where the new node has the wrong wal_level'
Changed.
> 18.
> +# Create an unnecessary slot on old node
> +$old_publisher->start;
> +$old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> +]);
> +
> +$old_publisher->stop;
> +
> +# Preparations for the subsequent test. max_replication_slots is set to
> +# smaller than existing slots on old node
> +$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
> +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
>
>
> IMO the comment is misleading. It is not an "unnecessary slot", it is
> just a 2nd slot. And this is all part of the preparation for the next
> test so it should be under the other comment.
>
> For example SUGGESTION changes like this:
>
> # Preparations for the subsequent test.
> # 1. Create an unnecessary slot on the old node
> $old_publisher->start;
> $old_publisher->safe_psql(
> 'postgres', qq[
> SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> ]);
> $old_publisher->stop;
> # 2. max_replication_slots is set to smaller than the number of slots
> (2) present on the old node
> $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> # 3. new node wal_level is set correctly
> $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
Followed the style.
> 19.
> +# Remove an unnecessary slot and consume WAL records
> +$old_publisher->start;
> +$old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_drop_replication_slot('test_slot2');
> + SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
> NULL)
> +]);
> +$old_publisher->stop;
> +
>
> This comment should say more like:
>
> # Preparations for the subsequent test.
Followed above style.
> 20.
> +# Actual run, pg_upgrade_output.d is removed at the end
>
> This comment should mention that "successful upgrade is expected"
> because all the other prerequisites are now satisfied.
The suggestion was added to the comment
> 21.
> +$new_publisher->start;
> +my $result = $new_publisher->safe_psql('postgres',
> + "SELECT slot_name, two_phase FROM pg_replication_slots");
> +is($result, qq(test_slot1|t), 'check the slot exists on new node');
>
> Should there be a matching new_pulisher->stop;?
Not sure it is really needed, but added.
Also, the word "node" was replaced to "cluster" because the later word is used
in the doc.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2023-08-21 13:05:57 | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Previous Message | Hayato Kuroda (Fujitsu) | 2023-08-21 13:04:06 | RE: [PoC] pg_upgrade: allow to upgrade publisher node |