Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-04-06 08:23:33
Message-ID: CAHut+PvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda-san.

This is a WIP review. I'm yet to do more testing and more study of the
POC patch's design.

While reading the code I kept a local list of my review comments.
Meanwhile, there is a long weekend coming up here, so I thought it
would be better to pass these to you now rather than next week in case
you want to address them.

======
General

1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.

pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"

help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.

~~~

2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.

======
Commit message

3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

~

"new option" --> "new pg_upgrade" option

~~~

4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

~

4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"

~

4b.
Typo: /replicaiton/replication/

~

4c
"leads an ERROR" --> "leads to an ERROR"

======

doc/src/sgml/ref/pg_dump.sgml

5.
+ <varlistentry>
+ <term><option>--slot-only</option></term>
+ <listitem>
+ <para>
+ Dump only replication slots, neither the schema (data definitions) nor
+ data. Mainly this is used for upgrading nodes.
+ </para>
+ </listitem>

SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.

======

doc/src/sgml/ref/pgupgrade.sgml

6.
+ <para>
+ Transport replication slots. Currently this can work only for logical
+ slots, and temporary slots are ignored. Note that pg_upgrade does not
+ check the installation of plugins.
+ </para>

SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...

======

src/bin/pg_dump/pg_dump.c

7. main
{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
{NULL, 0, NULL, 0}

The blank line is misplaced.

~~~

8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;

typo: /onlu/only/

~~~

9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+

9a.
typo: /replicatin/replication/

~

9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?

~~~

10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }

10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.

~

10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved above that (and also above the 'Dumping
LOs' etc...)

~~~

11. help

+ printf(_(" --slot-only dump only replication
slots, no schema and data\n"));

11a.
SUGGESTION
"no schema and data" --> "no schema or data"

~

11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.

~~~
12. getRepliactionSlots

+/*
+ * getRepliactionSlots
+ * get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)

Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)

~~~

13. getRepliactionSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 && !dopt->slot_only)
+ return;

Hmmm, is that condition correct? Shouldn't the && be || here?

~~~

14. dumpReplicationSlot

+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.name);
+ query = createPQExpBuffer();
+
+ /*
+ * XXX: For simplification, pg_create_logical_replication_slot() is used.
+ * Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+ slotname);
+ appendStringLiteralAH(query, slotinfo->plugin, fout);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralAH(query, slotinfo->twophase, fout);
+ appendPQExpBuffer(query, ");");
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+ ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+ ARCHIVE_OPTS(.tag = slotname,
+ .description = "REPICATION SLOT",
+ .section = SECTION_POST_DATA,
+ .createStmt = query->data));
+
+ /* XXX: do we have to dump security label? */
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "REPICATION SLOT", slotname,
+ NULL, NULL,
+ slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+ pfree(slotname);
+ destroyPQExpBuffer(query);
+}

14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().

~

14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS description.

~

14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

======

src/bin/pg_dump/pg_dump.h

15. DumpableObjectType

@@ -82,7 +82,8 @@ typedef enum
DO_PUBLICATION,
DO_PUBLICATION_REL,
DO_PUBLICATION_TABLE_IN_SCHEMA,
- DO_SUBSCRIPTION
+ DO_SUBSCRIPTION,
+ DO_REPICATION_SLOT
} DumpableObjectType;

Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

======

src/bin/pg_upgrade/dump.c

16. generate_old_dump

+ /*
+ * Dump replicaiton slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring replication
+ * slots and other objects. Replication slots, in particular, should
+ * not be restored before executing the pg_resetwal command because it
+ * will remove WALs that are required by the slots.
+ */

Typo: /replicaiton/replication/

======

src/bin/pg_upgrade/pg_upgrade.c

17. main

+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+

I don't think that warrants a "XXX" style comment. It is just a "Note:".

~~~

18. create_replicaiton_slots
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)

Typo: /create_replicaiton_slots/create_replication_slots/

(Function name and comment)

~~~

19. create_replicaiton_slots

+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char slots_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
+ char *opts;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+
+ snprintf(slots_file_name, sizeof(slots_file_name),
+ DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name),
+ DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ opts,
+ old_db->db_name,
+ log_opts.dumpdir,
+ slots_file_name);
+ }

That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?

Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

======

src/bin/pg_upgrade/pg_upgrade.h

20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS "pg_upgrade_dump_%u_slots.custom"

20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.

~

20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?

======

.../pg_upgrade/t/003_logical_replication.pl

21.
Some parts (formatting, comments, etc) in this file are inconsistent.

21a
");" is sometimes alone on a line, sometimes not

~

21b.
"Init" versus "Create" nodes.

~

21c.
# Check whether changes on new publisher are shipped to subscriber

SUGGESTION
Check whether changes on the new publisher get replicated to the subscriber
~

21d.
$result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
'check changes are shipped to subscriber');

For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2023-04-06 08:41:41 Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Previous Message Amit Kapila 2023-04-06 06:40:57 Re: Minimal logical decoding on standbys