Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-10-19 03:02:00
Message-ID: CALDaNm3E7cYxcfAHhZzQefywxiGxWsCwqBkN7BDeCCt+wzdjzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ======
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE *script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# ------------------------------
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> > # has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> > # still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.
> > +# And check the content. Both of slots must be reported that they have
> > +# unconsumed WALs after confirmed_flush_lsn.
> >
> > SUGGESTION
> > # Check the file content. Both slots should be reporting that they have
> > # unconsumed WAL records.
>
> Fixed.
>
> >
> > 8.
> > +# Preparations for the subsequent test:
> > +# 1. Setup logical replication
> > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> > +
> > +$old_publisher->start;
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> >
> >
> > 8a.
> > /Setup logical replication/Setup logical replication (first, cleanup
> > slots from the previous tests)/
>
> Fixed.
>
> > 8b.
> > Can't you combine all those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 9.
> > +
> > +# Actual run, successful upgrade is expected
> > +command_ok(
> > + [
> > + 'pg_upgrade', '--no-sync',
> > + '-d', $old_publisher->data_dir,
> > + '-D', $new_publisher->data_dir,
> > + '-b', $bindir,
> > + '-B', $bindir,
> > + '-s', $new_publisher->host,
> > + '-p', $old_publisher->port,
> > + '-P', $new_publisher->port,
> > + $mode,
> > + ],
> > + 'run of pg_upgrade of old cluster');
> >
> > Now that the "Dry run" part is removed, it seems unnecessary to say
> > "Actual run" for this part.
> >
> >
> > SUGGESTION
> > # pg_upgrade should be successful.
>
> Fixed.

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options "-c
max_slot_wal_keep_size=val"':
+ /*
+ * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+ * checkpointer process. If WALs required by logical replication slots
+ * are removed, the slots are unusable. This setting prevents the
+ * invalidation of slots during the upgrade. We set this option when
+ * cluster is PG17 or later because logical replication slots
can only be
+ * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

2) Because we are able to override max_slot_wal_keep_size there is a
chance of slot getting invalidated and Assert being hit:
+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+ *
+ * The following is just a sanity check.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ {
+ Assert(max_slot_wal_keep_size_mb == -1);
+ elog(ERROR, "replication slots must not be
invalidated during the upgrade");
+ }

3) File 003_logical_replication_slots.pl is now changed to
003_upgrade_logical_replication_slots.pl, it should be change here too
accordingly:
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
PGAPPICON = win32

+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2023-10-19 03:16:05 Re: Removing unneeded self joins
Previous Message Richard Guo 2023-10-19 02:57:57 Re: boolin comment not moved when code was refactored