Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-10-19 06:28:12
Message-ID: CALDaNm14Lwk47iuH7hkkB3vOXveq_3rCry6nHi=pNgeX3Z7QSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ======
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE *script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# ------------------------------
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> > # has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> > # still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.
> > +# And check the content. Both of slots must be reported that they have
> > +# unconsumed WALs after confirmed_flush_lsn.
> >
> > SUGGESTION
> > # Check the file content. Both slots should be reporting that they have
> > # unconsumed WAL records.
>
> Fixed.
>
> >
> > 8.
> > +# Preparations for the subsequent test:
> > +# 1. Setup logical replication
> > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> > +
> > +$old_publisher->start;
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> >
> >
> > 8a.
> > /Setup logical replication/Setup logical replication (first, cleanup
> > slots from the previous tests)/
>
> Fixed.
>
> > 8b.
> > Can't you combine all those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 9.
> > +
> > +# Actual run, successful upgrade is expected
> > +command_ok(
> > + [
> > + 'pg_upgrade', '--no-sync',
> > + '-d', $old_publisher->data_dir,
> > + '-D', $new_publisher->data_dir,
> > + '-b', $bindir,
> > + '-B', $bindir,
> > + '-s', $new_publisher->host,
> > + '-p', $old_publisher->port,
> > + '-P', $new_publisher->port,
> > + $mode,
> > + ],
> > + 'run of pg_upgrade of old cluster');
> >
> > Now that the "Dry run" part is removed, it seems unnecessary to say
> > "Actual run" for this part.
> >
> >
> > SUGGESTION
> > # pg_upgrade should be successful.
>
> Fixed.

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the invalid_logical_replication_slots.txt
file. There is something wrong here.
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+ );

2) If the test fails here, it is difficult to debug as the
pg_upgrade_output.d directory was removed, so better to keep the
directory as it is this case:
+ # Check the file content. Both slots should be reporting that they have
+ # unconsumed WAL records.
+ like(
+ slurp_file($slots_filename),
+ qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+ 'the previous test failed due to unconsumed WALs');
+ like(
+ slurp_file($slots_filename),
+ qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+ 'the previous test failed due to unconsumed WALs');
+
+ # Clean up
+ rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

3) The below could be changed:
+ # Check the file content. Both slots should be reporting that they have
+ # unconsumed WAL records.
+ like(
+ slurp_file($slots_filename),
+ qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+ 'the previous test failed due to unconsumed WALs');
+ like(
+ slurp_file($slots_filename),
+ qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+ 'the previous test failed due to unconsumed WALs');

to:
my $result = slurp_file($slots_filename);
is( $result, qq(The slot "test_slot1" has not consumed the WAL yet
The slot "test_slot2" has not consumed the WAL yet
),
'the previous test failed due to unconsumed WALs');

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-10-19 07:04:44 Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Previous Message Shlok Kyal 2023-10-19 06:21:27 Re: [PoC] pg_upgrade: allow to upgrade publisher node