Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-22 00:31:25
Message-ID: CAHut+Ps+Varx8uUF5c7YMXCSMO6BZ8SUXc9pv5iXkM2rqh7DEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda-san,

Here are some review comments for v22-0003.

(FYI, I was already mid-way through this review before you posted new v23*
patches, so I am posting it anyway in case some comments still apply.)

======
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.

~

1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+ PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore

~~~

3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.

~

3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore

======
src/bin/pg_upgrade/controldata.c

5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/

~~~

5b.
>= 17 should be >= 1700

~~~

6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a
better mirror for LSN_FORMAT_ARGS.

======
src/bin/pg_upgrade/info.c

7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this
function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the
check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the
check_and_dump_old_cluster() function -- see the "Check for various failure
cases" comment -- IMO this new fragment belongs there with the other checks.

======
src/bin/pg_upgrade/pg_upgrade.h

8.
bool date_is_int;
bool float8_pass_by_value;
uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
} ControlData;

I don't think the new field is particularly different from all the others
that it needs a blank line separator.

======
.../t/003_logical_replication_slots.pl

9.
# Initialize old node
my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
$old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;

# Initialize new node
my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
$new_publisher->init(allows_streaming => 'replica');

-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

~

Are those removal of the old_publisher start/stop changes that actually
should be done in the 0002 patch?

~~~

10.
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding',
false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
NULL);
]);

~

What is the purpose of the added SELECT? It doesn't seem covered by the
comment.

~~~

11.
# Remove an unnecessary slot and generate WALs. These records would not be
# consumed before doing pg_upgrade, so that the upcoming test would fail.
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot2');
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
]);
$old_publisher->stop;

Minor rewording of comment sentence.

SUGGESTION
Because these WAL records do not get consumed it will cause the upcoming
pg_upgrade test to fail.

~~~

12.
# Cause a failure at the start of pg_upgrade because the slot still have
# unconsumed WAL records

~

/still have/still has/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-08-22 01:44:07 Re: should frontend tools use syncfs() ?
Previous Message Michael Paquier 2023-08-21 23:58:42 Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue