RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-04-07 13:59:58
Message-ID: TYCPR01MB5870E212F5012FD6272CE1E3F5969@TYCPR01MB5870.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thank you for reviewing briefly. PSA new version.
If you can I want to ask the opinion about the checking by pg_upgrade [1].

> ======
> General
>
> 1.
> Since these two new options are made to work together, I think the
> names should be more similar. e.g.
>
> pg_dump: "--slot_only" --> "--replication-slots-only"
> pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"
>
> help/comments/commit-message all should change accordingly, but I did
> not give separate review comments for each of these.

OK, I renamed. By the way, how do you think the suggestion raised by Julien?
Currently I did not address it because the restriction was caused by just lack of
analysis, and this may be not agreed in the community.
Or, should we keep the name anyway?

> 2.
> I felt there maybe should be some pg_dump test cases for that new
> option, rather than the current patch where it only seems to be
> testing the new pg_dump option via the pg_upgrade TAP tests.

Hmm, I supposed that the option shoul be used only for upgrading, so I'm not sure
it must be tested by only pg_dump.

> Commit message
>
> 3.
> This commit introduces a new option called "--include-replication-slot".
> This allows nodes with logical replication slots to be upgraded. The commit can
> be divided into two parts: one for pg_dump and another for pg_upgrade.
>
> ~
>
> "new option" --> "new pg_upgrade" option

Fixed.

> 4.
> For pg_upgrade, when '--include-replication-slot' is specified, it
> executes pg_dump
> with added option and restore from the dump. Apart from restoring
> schema, pg_resetwal
> must not be called after restoring replicaiton slots. This is because
> the command
> discards WAL files and starts from a new segment, even if they are required by
> replication slots. This leads an ERROR: "requested WAL segment XXX has already
> been removed". To avoid this, replication slots are restored at a different time
> than other objects, after running pg_resetwal.
>
> ~
>
> 4a.
> "with added option and restore from the dump" --> "with the new
> "--slot-only" option and restores from the dump"

Fixed.

> 4b.
> Typo: /replicaiton/replication/

Fixed.

> 4c
> "leads an ERROR" --> "leads to an ERROR"

Fixed.

> doc/src/sgml/ref/pg_dump.sgml
>
> 5.
> + <varlistentry>
> + <term><option>--slot-only</option></term>
> + <listitem>
> + <para>
> + Dump only replication slots, neither the schema (data definitions) nor
> + data. Mainly this is used for upgrading nodes.
> + </para>
> + </listitem>
>
> SUGGESTION
> Dump only replication slots; not the schema (data definitions), nor
> data. This is mainly used when upgrading nodes.

Fixed.

> doc/src/sgml/ref/pgupgrade.sgml
>
> 6.
> + <para>
> + Transport replication slots. Currently this can work only for logical
> + slots, and temporary slots are ignored. Note that pg_upgrade does not
> + check the installation of plugins.
> + </para>
>
> SUGGESTION
> Upgrade replication slots. Only logical replication slots are
> currently supported, and temporary slots are ignored. Note that...

Fixed.

> src/bin/pg_dump/pg_dump.c
>
> 7. main
> {"exclude-table-data-and-children", required_argument, NULL, 14},
> -
> + {"slot-only", no_argument, NULL, 15},
> {NULL, 0, NULL, 0}
>
> The blank line is misplaced.

Fixed.

> 8. main
> + case 15: /* dump onlu replication slot(s) */
> + dopt.slot_only = true;
> + dopt.include_everything = false;
> + break;
>
> typo: /onlu/only/

Fixed.

> 9. main
> + if (dopt.slot_only && dopt.dataOnly)
> + pg_fatal("options --replicatin-slots and -a/--data-only cannot be
> used together");
> + if (dopt.slot_only && dopt.schemaOnly)
> + pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
> used together");
> +
>
> 9a.
> typo: /replicatin/replication/

Fixed. Additionally, wrong parameter reference was also fixed.

> 9b.
> I am wondering if these checks are enough. E.g. is "slots-only"
> compatible with "no-publications" ?

I think there are something what should be checked more. But I'm not sure about
"no-publication". There is a possibility that non-core logical replication is used,
and at that time these options are not contradicted.

> 10. main
> + /*
> + * If dumping replication slots are request, dumping them and skip others.
> + */
> + if (dopt.slot_only)
> + {
> + getRepliactionSlots(fout);
> + goto dump;
> + }
>
> 10a.
> SUGGESTION
> If dump replication-slots-only was requested, dump only them and skip
> everything else.

Fixed.

> 10b.
> This code seems mutually exclusive to every other option. I'm
> wondering if this code even needs 'collectRoleNames', or should the
> slots option check be moved above that (and also above the 'Dumping
> LOs' etc...)

I read again, and I found that collected username are used to check the owner of
objects. IIUC replicaiton slots are not owned by database users, so it is not
needed. Also, the LOs should not dumped here. Based on them, I moved getRepliactionSlots()
above them.

> 11. help
>
> + printf(_(" --slot-only dump only replication
> slots, no schema and data\n"));
>
> 11a.
> SUGGESTION
> "no schema and data" --> "no schema or data"

Fixed.

> 11b.
> This help is misplaced. It should be in alphabetical order consistent
> with all the other help.
>
> ~~~
> 12. getRepliactionSlots
>
> +/*
> + * getRepliactionSlots
> + * get information about replication slots
> + */
> +static void
> +getRepliactionSlots(Archive *fout)
>
> Function name typo / getRepliactionSlots/ getReplicationSlots/
> (also in the comment)

Fixed.

> 13. getRepliactionSlots
>
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 160000 && !dopt->slot_only)
> + return;
>
> Hmmm, is that condition correct? Shouldn't the && be || here?

Right, fixed.

> 14. dumpReplicationSlot
>
> +static void
> +dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
> +{
> + DumpOptions *dopt = fout->dopt;
> + PQExpBuffer query;
> + char *slotname;
> +
> + if (!dopt->slot_only)
> + return;
> +
> + slotname = pg_strdup(slotinfo->dobj.name);
> + query = createPQExpBuffer();
> +
> + /*
> + * XXX: For simplification, pg_create_logical_replication_slot() is used.
> + * Is it sufficient?
> + */
> + appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
> + slotname);
> + appendStringLiteralAH(query, slotinfo->plugin, fout);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteralAH(query, slotinfo->twophase, fout);
> + appendPQExpBuffer(query, ");");
> +
> + if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
> + ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
> + ARCHIVE_OPTS(.tag = slotname,
> + .description = "REPICATION SLOT",
> + .section = SECTION_POST_DATA,
> + .createStmt = query->data));
> +
> + /* XXX: do we have to dump security label? */
> +
> + if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
> + dumpComment(fout, "REPICATION SLOT", slotname,
> + NULL, NULL,
> + slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
> +
> + pfree(slotname);
> + destroyPQExpBuffer(query);
> +}
>
> 14a.
> Wouldn't it be better to check the "slotinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION" condition first, before building the query?
> For example, see other function dumpIndexAttach().

The style was chosen because previously I referred dumpSubscription(). But I read
PG manual and understood that COMMENT and SECURITY LABEL cannot be set to replication
slots. Therefore, I removed comments and dump for DUMP_COMPONENT_COMMENT, then
followed the style.

> 14b.
> Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS
> description.
>
> ~
>
> 14c.
> Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

Both of them were fixed.

> src/bin/pg_dump/pg_dump.h
>
> 15. DumpableObjectType
>
> @@ -82,7 +82,8 @@ typedef enum
> DO_PUBLICATION,
> DO_PUBLICATION_REL,
> DO_PUBLICATION_TABLE_IN_SCHEMA,
> - DO_SUBSCRIPTION
> + DO_SUBSCRIPTION,
> + DO_REPICATION_SLOT
> } DumpableObjectType;
>
> Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

Fixed.

> src/bin/pg_upgrade/dump.c
>
> 16. generate_old_dump
>
> + /*
> + * Dump replicaiton slots if needed.
> + *
> + * XXX We cannot dump replication slots at the same time as the schema
> + * dump because we need to separate the timing of restoring replication
> + * slots and other objects. Replication slots, in particular, should
> + * not be restored before executing the pg_resetwal command because it
> + * will remove WALs that are required by the slots.
> + */
>
> Typo: /replicaiton/replication/

Fixed.

> src/bin/pg_upgrade/pg_upgrade.c
>
> 17. main
>
> + /*
> + * Create replication slots if requested.
> + *
> + * XXX This must be done after doing pg_resetwal command because the
> + * command will remove required WALs.
> + */
> + if (user_opts.include_slots)
> + {
> + start_postmaster(&new_cluster, true);
> + create_replicaiton_slots();
> + stop_postmaster(false);
> + }
> +
>
> I don't think that warrants a "XXX" style comment. It is just a "Note:".

Fixed. Could you please tell me the classification of them if you can?

> 18. create_replicaiton_slots
> +
> +/*
> + * create_replicaiton_slots()
> + *
> + * Similar to create_new_objects() but only restores replication slots.
> + */
> +static void
> +create_replicaiton_slots(void)
>
> Typo: /create_replicaiton_slots/create_replication_slots/
>
> (Function name and comment)

All of them were replaced.

> 19. create_replicaiton_slots
>
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + char slots_file_name[MAXPGPATH],
> + log_file_name[MAXPGPATH];
> + DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
> + char *opts;
> +
> + pg_log(PG_STATUS, "%s", old_db->db_name);
> +
> + snprintf(slots_file_name, sizeof(slots_file_name),
> + DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
> + snprintf(log_file_name, sizeof(log_file_name),
> + DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
> +
> + opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
> +
> + parallel_exec_prog(log_file_name,
> + NULL,
> + "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
> + new_cluster.bindir,
> + cluster_conn_opts(&new_cluster),
> + opts,
> + old_db->db_name,
> + log_opts.dumpdir,
> + slots_file_name);
> + }
>
> That 'opts' variable seems unnecessary. Why not just pass the string
> literal directly when invoking parallel_exec_prog()?
>
> Or if not removed, then at make it const char psql_opts =
> "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

I had tried to follow the prepare_new_globals() style, but
I preferred your suggestion. Fixed.

> src/bin/pg_upgrade/pg_upgrade.h
>
> 20.
> +#define DB_DUMP_FILE_MASK_FOR_SLOTS
> "pg_upgrade_dump_%u_slots.custom"
>
> 20a.
> For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
> probably this should be called DB_DUMP_SLOTS_FILE_MASK.

Fixed.

> 20b.
> Because the content of this dump/restore file is SQL (not custom
> binary) wouldn't a filename suffix ".sql" be better?

Right, fixed.

> .../pg_upgrade/t/003_logical_replication.pl
>
> 21.
> Some parts (formatting, comments, etc) in this file are inconsistent.
>
> 21a
> ");" is sometimes alone on a line, sometimes not

I ran pgperltidy and lonely ");" is removed.

> 21b.
> "Init" versus "Create" nodes.

"Initialize" was chosen.

> 21c.
> # Check whether changes on new publisher are shipped to subscriber
>
> SUGGESTION
> Check whether changes on the new publisher get replicated to the subscriber

Fixed.

> 21d.
> $result =
> $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
> is($result, qq(20),
> 'check changes are shipped to subscriber');
>
> For symmetry with before/after, I think it would be better to do this
> same command before the upgrade to confirm q(10) rows.

Added.

[1]: https://www.postgresql.org/message-id/20230407024823.3j2s4doslsjemvis%40jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v2-0001-pg_upgrade-Add-include-replication-slots-option.patch application/octet-stream 22.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-07 14:00:23 Re: pgsql: psql: add an optional execution-count limit to \watch.
Previous Message Thomas Munro 2023-04-07 13:32:22 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?