Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-10-18 02:01:17
Message-ID: CAHut+Ps0t7tA+APT5e7jBG+wMFv2ic+a4rayhfOsFV40B=1=0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c

0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");

0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

~

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.

SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

~~~

2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);

Can't you combine those SQL in the same $old_publisher->safe_psql.

~~~

3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");

The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?

~~~

4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);

I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.

4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.

SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.

~

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

~~~

5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(

/because the slot still has/because there are slots still having/

~~~

6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);

/slot/slots/

~~~

7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.

SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.

~~~

8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");

8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/

~

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

~~~

9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');

Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-10-18 02:13:01 Re: The danger of deleting backup_label
Previous Message Michael Paquier 2023-10-18 00:35:17 Re: New WAL record to detect the checkpoint redo location