RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-18 13:43:10
Message-ID: TYAPR01MB586604802ABE42E11866762FF51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thanks for reviewing!

> For patch v21-0001...
>
> ======
> 1. SaveSlotToPath
>
> - /* and don't do anything if there's nothing to write */
> - if (!was_dirty)
> + /*
> + * and don't do anything if there's nothing to write, unless it's this is
> + * called for a logical slot during a shutdown checkpoint, as we want to
> + * persist the confirmed_flush_lsn in that case, even if that's the only
> + * modification.
> + */
> + if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
> return;
>
> The condition seems to be coded in a slightly awkward way when
> compared to how the comment was worded.
>
> How about:
> if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))

Changed.

> For patch v21-0002...
>
> ======
> Commit Message
>
> 1.
>
> For pg_upgrade, it query the logical replication slots information from the old
> cluter and restores the slots using the pg_create_logical_replication_slots()
> statements. Note that we need to separate the timing of restoring replication
> slots and other objects. Replication slots, in particular, should not be
> restored before executing the pg_resetwal command because it will remove WALs
> that are required by the slots.
>
> ~
>
> Revisit this paragraph. There are lots of typos etc.

Maybe I sent the patch before finalizing the commit message. Sorry for that.
I reworded the part. Grammarly says OK the new part.

> 1a.
> "For pg_upgrade". I think this wording is a hangover from back when
> the patch was split into two parts for pg_dump and pg_upgrade, but now
> it seems strange.

Yeah, so removed the word.

> 1b.
> /cluter/cluster/

Changed.

> 1c
> /because it/because pg_resetwal/

Changed.

> src/sgml/ref/pgupgrade.sgml
>
> 2.
>
> + <step>
> + <title>Prepare for publisher upgrades</title>
> +
> + <para>
> + <application>pg_upgrade</application> try to dump and restore logical
> + replication slots. This helps avoid the need for manually defining the
> + same replication slot on the new publisher.
> + </para>
> +
>
> 2a.
> /try/attempts to/ ??

Changed.

> 2b.
> Is "dump" the right word here? I didn't see dumping happening in the
> patch anymore.

I replaced "dump and restore" to " migrate". How do you think?

> 3.
>
> + <para>
> + Before you start upgrading the publisher node, ensure that the
> + subscription is temporarily disabled. After the upgrade is complete,
> + execute the
> + <link linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION ... DISABLE</command></link>
> + command to update the connection string, and then re-enable the
> + subscription.
> + </para>
>
> 3a.
> That link made no sense in this context.
>
> Don't you mean to say:
> <command>ALTER SUBSCRIPTION ... CONNECTION ...</command>
> 3b.
> Hmm. I wonder now did you *also* mean to describe how to disable? For example:
>
> Before you start upgrading the publisher node, ensure that the
> subscription is temporarily disabled, by executing
> <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
> DISABLE</command></link>.

I wondered which statement should be referred, and finally did incompletely.
Both of ALTER SUBSCRIPTION statements was cited, and a link was added to
DISABLE clause. Is it OK?

> 4.
>
> +
> + <para>
> + Upgrading slots has some settings. At first, all the slots must not be in
> + <literal>lost</literal>, and they must have consumed all the WALs on old
> + node. Furthermore, new node must have larger
> + <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> + than existing slots on old node, and
> + <link linkend="guc-wal-level"><varname>wal_level</varname></link>
> must be
> + <literal>logical</literal>. <application>pg_upgrade</application> will
> + run error if something wrong.
> + </para>
> + </step>
> +
>
> 4a.
> "At first, all the slots must not be in lost"
>
> Apart from being strangely worded, I was not familiar with what it
> meant to say "must not be in lost". Will this be meaningful to the
> user?
>
> IMO this should have more description, e.g. including mentioning the
> "wal_status" attribute with the appropriate link to
> https://www.postgresql.org/docs/current/view-pg-replication-slots.html

Added the reference.

> 4b.
> BEFORE
> Upgrading slots has some settings. ...
> <application>pg_upgrade</application> will run error if something
> wrong.
>
> SUGGESTION
> There are some prerequisites for <application>pg_upgrade</application>
> to be able to upgrade the replication slots. If these are not met an
> error will be reported.

Changed.

> 4c.
> Wondered if this list of prerequisites might be better presented as an
> SGML list.

Changed to <itemizedlist> style.

> src/bin/pg_upgrade/check.c
>
> 5.
> extern char *output_files[];
> +extern int num_slots_on_old_cluster;
>
> ~
>
> IMO something feels not quite right about having this counter floating
> around as a global variable.
>
> Shouldn't this instead be a field member of the old_cluster. That
> seems to be the normal way to hold the cluster-wise info.

Per comment from Amit, the variable was removed.

> 6. check_new_cluster_is_empty
>
> RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
> + DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
> + LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
>
> IIRC I previously suggested adding this 'pDbInfo' variable because
> there are several places that can make use of it.
>
> You are using it only in the NEW code, but did not replace the
> existing other code to make use of it:
> pg_fatal("New cluster database \"%s\" is not empty: found relation \"%s.%s\"",
> new_cluster.dbarr.dbs[dbnum].db_name,

Right, switched to use it. Additionally, it was also used for definition of rel_arr.

> 7. check_for_logical_replication_slots
>
> +
> +/*
> + * Verify the parameter settings necessary for creating logical replication
> + * slots.
> + */
> +static void
> +check_for_logical_replication_slots(ClusterInfo *new_cluster)
> +{
> + PGresult *res;
> + PGconn *conn = connectToServer(new_cluster, "template1");
> + int max_replication_slots;
> + char *wal_level;
> +
> + /* logical replication slots can be dumped since PG17. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> + return;
> +
> + prep_status("Checking parameter settings for logical replication slots");
> +
> + res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
> + max_replication_slots = atoi(PQgetvalue(res, 0, 0));
> +
> + if (max_replication_slots == 0)
> + pg_fatal("max_replication_slots must be greater than 0");
> + else if (num_slots_on_old_cluster > max_replication_slots)
> + pg_fatal("max_replication_slots must be greater than existing logical "
> + "replication slots on old node.");
> +
> + PQclear(res);
> +
> + res = executeQueryOrDie(conn, "SHOW wal_level;");
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (strcmp(wal_level, "logical") != 0)
> + pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
> + wal_level);
> +
> + PQclear(res);
> +
> + PQfinish(conn);
> +
> + check_ok();
>
> ~
>
> 7a.
> +check_for_logical_replication_slots(ClusterInfo *new_cluster)
>
> IMO it is bad practice to name this argument 'new_cluster'. You will
> end up shadowing the global variable of the same name. It seems in
> other similar code where &new_cluster is passed as a parameter the
> function arg there is called just 'cluster'.

Hmm, but check_for_new_tablespace_dir() has an argument 'new_cluster',
AFAICS, the check function only called for new cluster has an argument "new_cluster",
whereas the function called for both or old cluster has "cluster". Am I missing
something, or anyway it should be fixed? Currently I kept it.

> 7b.
> "/* logical replication slots can be dumped since PG17. */"
>
> Is "dumped" the correct word to be used here? Where is the "dump"?

Changed to "migrated"

> 7c.
>
> + if (max_replication_slots == 0)
> + pg_fatal("max_replication_slots must be greater than 0");
> + else if (num_slots_on_old_cluster > max_replication_slots)
> + pg_fatal("max_replication_slots must be greater than existing logical "
> + "replication slots on old node.");
>
> Why is the 1st condition here even needed? Isn't it sufficient just to
> have that 2nd condition to check max_replication_slot is big enough?

Yeah, sufficient. This is a garbage of previous changes. Fixed.

> 8. src/bin/pg_upgrade/dump.c
>
> {
> char sql_file_name[MAXPGPATH],
> log_file_name[MAXPGPATH];
> +
> DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
> ~

Removed.

> ======
> src/bin/pg_upgrade/function.c
>
> 9. get_loadable_libraries -- GENERAL
>
> @@ -46,7 +46,8 @@ library_name_compare(const void *p1, const void *p2)
> /*
> * get_loadable_libraries()
> *
> - * Fetch the names of all old libraries containing C-language functions.
> + * Fetch the names of all old libraries containing C-language functions, and
> + * output plugins used by existing logical replication slots.
> * We will later check that they all exist in the new installation.
> */
> void
> @@ -66,14 +67,21 @@ get_loadable_libraries(void)
> PGconn *conn = connectToServer(&old_cluster, active_db->db_name);
>
> /*
> - * Fetch all libraries containing non-built-in C functions in this DB.
> + * Fetch all libraries containing non-built-in C functions and
> + * output plugins in this DB.
> */
> ress[dbnum] = executeQueryOrDie(conn,
> "SELECT DISTINCT probin "
> "FROM pg_catalog.pg_proc "
> "WHERE prolang = %u AND "
> "probin IS NOT NULL AND "
> - "oid >= %u;",
> + "oid >= %u "
> + "UNION "
> + "SELECT DISTINCT plugin "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;",
> ClanguageId,
> FirstNormalObjectId);
> totaltups += PQntuples(ress[dbnum]);
>
> ~
>
> Maybe it is OK, but it somehow seems like the new logic has been
> jammed into the get_loadable_libraries() function for coding
> convenience. For example, all the names (function names, variable
> names, structure field names) are referring to "libraries", so the
> plugin seems a bit out of place.

Per discussion with Amit and you [1], I kept the style. Comments atop and in
the function was changed instead.

> 10. get_loadable_libraries
>
> /* Fetch all library names, removing duplicates within each DB */
> for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> ~
>
> This code comment still refers only to library names.

I think this is right, because output plugins are also the library.

> 10. get_loadable_libraries
>
> + "UNION "
> + "SELECT DISTINCT plugin "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;",
>
> IMO this SQL might be more readable if it uses an alias (like 'rs')
> for the catalog. Then rs.wal_status, rs.database, rs.temporary etc.

Per discussion with Amit and you [1], this comment was ignored.

> src/bin/pg_upgrade/info.c
>
> 11. get_logical_slot_infos_per_db
>
> + snprintf(query, sizeof(query),
> + "SELECT slot_name, plugin, two_phase "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE database = current_database() AND temporary = false "
> + "AND wal_status <> 'lost';");
>
> There was similar SQL in get_loadable_libraries() but there you wrote:
>
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;",
>
> The WHERE condition order and case are all slightly different. IMO it
> would be better for both SQL fragments to be exactly the same.

Unified to later one.

> 12. get_logical_slot_infos
>
> +int
> +get_logical_slot_infos(ClusterInfo *cluster)
> +{
> + int dbnum;
> + int slotnum = 0;
> +
>
> I think 'slotnum' is not a good name. In other nearby code (e.g.
> print_slot_infos) 'slotnum' is used to mean the index of each slot,
> but here it means the total number of slots. How about a name like
> 'slot_count' or 'nslots' something where the name is more meaningful?

Changed to slot_count.

> 13. free_db_and_rel_infos
>
> +
> + /*
> + * Logical replication slots must not exist on the new cluster before
> + * doing create_logical_replication_slots().
> + */
> + Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
>
> Isn't it more natural to do: Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);

Changed.

> src/bin/pg_upgrade/pg_upgrade.c
>
> 14. create_logical_replication_slots
>
> +create_logical_replication_slots(void)
> +{
> + int dbnum;
> + int slotnum;
>
> The 'slotnum' can be declared at a lower scope than this to be closer
> to where it is actually used.

Moved.

> 15. create_logical_replication_slots
>
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
> + LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
> + PQExpBuffer query,
> + escaped;
> + PGconn *conn;
> + char log_file_name[MAXPGPATH];
> +
> + /* Quick exit if there are no slots */
> + if (!slot_arr->nslots)
> + continue;
>
> The comment is misleading. There is no exiting. Maybe better to say
> something like "Skip this DB if there are no slots".

Changed.

> 16. create_logical_replication_slots
>
> + appendPQExpBuffer(query, "SELECT
> pg_catalog.pg_create_logical_replication_slot(");
> + appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
> + slot_arr->encoding, slot_arr->std_strings);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
> + slot_arr->encoding, slot_arr->std_strings);
> + appendPQExpBuffer(query, ", false, %s);",
> + slot_arr->slots[slotnum].two_phase ? "true" : "false");
>
> I noticed that the function comment for appendStringLiteral() says:
> "We need it in situations where we do not have a PGconn available.
> Where we do, appendStringLiteralConn is a better choice.".
>
> But in this code, we *do* have PGconn available. So, shouldn't we be
> following the advice of the appendStringLiteral() function comment and
> use the other API instead?

Changed to use appendStringLiteralConn.

> 17. create_logical_replication_slots
>
> + /*
> + * The string must be escaped to shell-style, because there is a
> + * possibility that output plugin name contains quotes. The output
> + * string would be sandwiched by the single quotes, so it does not have
> + * to be wrapped by any quotes when it is passed to
> + * parallel_exec_prog().
> + */
> + appendShellString(escaped, query->data);
>
> /sandwiched by/enclosed by/ ???

This part was no longer needed because we do not bypass strings to the
shell. The initial motivation of the change was to execute in parallel, and
the string was escaped with shell-style and pass to psql -c option for the
purpose. But I found that it shows huge performance degradation, so reverted
the change. See my report [2].

> src/bin/pg_upgrade/pg_upgrade.h
>
> 18. LogicalSlotInfo
>
> +/*
> + * Structure to store logical replication slot information
> + */
> +typedef struct
> +{
> + char *slotname; /* slot name */
> + char *plugin; /* plugin */
> + bool two_phase; /* Can the slot decode 2PC? */
> +} LogicalSlotInfo;
>
> Looks a bit strange when the only last field comment is uppercase but
> the others are not. Maybe lowercase everything like for other nearby
> structs.

Changed.

> 19. LogicalSlotInfoArr
>
> +
> +typedef struct
> +{
> + int nslots;
> + LogicalSlotInfo *slots;
> + int encoding;
> + bool std_strings;
> +} LogicalSlotInfoArr;
> +
>
> The meaning of those fields is not always obvious. IMO they can all be
> commented on.

Added. Note that encoding and std_strings were removed because it was
used by appendStringLiteral().

> .../pg_upgrade/t/003_logical_replication_slots.pl
>
> 20.
>
> # Cause a failure at the start of pg_upgrade because wal_level is replica
>
> ~
>
> I wondered if it would be clearer if you had to explicitly set the
> new_node to "replica" initially, instead of leaving it default.

Changed.

> 21.
>
> # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
>
> ~
>
> This related to my earlier code comment in this post -- I didn't
> understand the need to specially test for 0. IIUC, we really are
> interested only to know if there are *sufficient*
> max_replication_slots.

Agreed, removed.

> 22.
>
> 'run of pg_upgrade of old node with small max_replication_slots');
>
> ~
>
> SUGGESTION
> run of pg_upgrade where the new node has insufficient max_replication_slots

Changed.

> 23.
>
> # Preparations for the subsequent test. max_replication_slots is set to
> # appropriate value
> $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
>
> # Remove an unnecessary slot and consume WALs
> $old_node->start;
> $old_node->safe_psql(
> 'postgres', qq[
> SELECT pg_drop_replication_slot('test_slot1');
> SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
> ]);
> $old_node->stop;
>
> ~
>
> Some of that preparation seems unnecessary. I think the new node
> max_replication_slots is 1 already, so if you are going to remove one
> of test_slot1 here then there is only ONE slot left, right? So the
> max_replication_slots on the new node should be OK now. Not only will
> there be less test code needed here, but you will be testing the
> boundary condition of max_replication_slots (which is probably a good
> thing to do).

Removed.

Next version would be available in the upcoming post.

[1]: https://www.postgresql.org/message-id/CAA4eK1LhEwxQmK2ZepYTYDOKp6F8JCFbiBcw5EoQFbs-CjmY7Q%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-08-18 13:51:36 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Hayato Kuroda (Fujitsu) 2023-08-18 13:35:06 RE: [PoC] pg_upgrade: allow to upgrade publisher node