RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-12 11:50:22
Message-ID: TYAPR01MB5866AC50DCFCBC989D67E6BDF5F1A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thank you for reviewing! PSA new version.

> src/backend/replication/slot.c
>
> 3. InvalidatePossiblyObsoleteSlot
>
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
>
> 3a.
> That comment didn't seem good. I think you mean like in the suggestion below.
>
> SUGGESTION
> It should not be possible for logical replication slots to be
> invalidated because max_slot_wal_keep_size is set to -1 during the
> upgrade. The following is just for sanity-checking.

This part was updated in v35. Please tell me if current version is still bad...

> 3b.
> I wasn't sure if 'max_slot_wal_keep_size' GUC is accessible in this
> scope, but if it is available then maybe
> Assert(max_slot_wal_keep_size_mb == -1); should also be included in
> this sanity check.

IIUC, guc parameters are visible from all the postgres processes.
Added.

> src/bin/pg_upgrade/check.c
>
> 4. check_new_cluster_logical_replication_slots
>
> + conn = connectToServer(&new_cluster, "template1");
> +
> + prep_status("Checking for logical replication slots");
>
> There is some inconsistency with all the subsequent pg_fatals within
> this function -- some of them mention "New cluster" but most of them
> do not.
>
> Meanwhile, Kuroda-san showed me sample output like:
>
> Checking for presence of required libraries ok
> Checking database user is the install user ok
> Checking for prepared transactions ok
> Checking for new cluster tablespace directories ok
> Checking for logical replication slots
> New cluster must not have logical replication slots but found 1 slot.
> Failure, exiting
>
> So, I felt the log message title ("Checking...") should be changed to
> include the words "new cluster" just like the log preceding it:
>
> "Checking for logical replication slots" ==> "Checking for new cluster
> logical replication slots"
>
> Now all the subsequent pg_fatals clearly are for "new cluster"

Changed.

> 5. check_new_cluster_logical_replication_slots
>
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> + "New cluster must not have logical replication slots but found %d slots.",
> + nslots_on_new),
> + nslots_on_new);
>
> 5a.
> TBH, I didn't see why you go to unnecessary trouble to have a plural
> message here. The message could just be like:
> "New cluster must have 0 logical replication slots but found %d."
>
> ~
>
> 5b.
> However, now (from the previous review comment #4) if "New cluster" is
> already explicit in the log, the pg_fatal message can become just:
> "New cluster must have ..." ==> "Expected 0 logical replication slots
> but found %d."

Basically it's better. But the initial character should be lower case and period
is not needed. Modified like that.

> 9. get_old_cluster_logical_slot_infos
>
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> + i_caught_up = PQfnumber(res, "caught_up");
> + i_invalid = PQfnumber(res, "conflicting");
>
> IMO SQL should be using an alias for this column, so you can say:
> i_invalid = PQfnumber(res, "invalid")
>
> which seems better than switching the wording in code.

Modified. The argument of PQfnumber() must be same as the column name, so the
word "as invalid" was added to SQL.

> src/bin/pg_upgrade/pg_upgrade.h
>
> 10. LogicalSlotInfo
>
> +typedef struct
> +{
> + char *slotname; /* slot name */
> + char *plugin; /* plugin */
> + bool two_phase; /* can the slot decode 2PC? */
> + bool caught_up; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> + bool invalid; /* Is the slot usable? */
> +} LogicalSlotInfo;
>
> ~
>
> + bool invalid; /* Is the slot usable? */
> This field name and comment have opposite meanings. Invalid means NOT usable.
>
> SUGGESTION
> /* If true, the slot is unusable. */

Fixed.

> src/bin/pg_upgrade/server.c
>
> 11. start_postmaster
>
> * we only modify the new cluster, so only use it there. If there is a
> * crash, the new cluster has to be recreated anyway. fsync=off is a big
> * win on ext4.
> + *
> + * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
> + * required by logical slots. The setting could avoid the invalidation of
> + * slots during the upgrade.
> */
> ~
>
> IMO this comment "to prevent the WAL removal required by logical
> slots" is ambiguous about how it could be interpreted. Needs
> rearranging for clarity.

The description was changed. How do you think?

> 12. start_postmaster
>
> (cluster == &new_cluster) ?
> - " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
> + " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c
> max_slot_wal_keep_size=-1 " :
> + " -c max_slot_wal_keep_size=-1",
>
> Instead of putting the same option on both sides of the ternary, I was
> wondering if it might be better to hardwire the max_slot_wal_keep_size
> just 1 time in the format string?

Fixed.

> .../pg_upgrade/t/003_logical_replication_slots.pl
>
> 13.
> # Remove the remained slot
>
> /remained/remaining/

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v36-0001-Flush-logical-slots-to-disk-during-a-shutdown-ch.patch application/octet-stream 9.8 KB
v36-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch application/octet-stream 39.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-09-12 11:50:30 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Andrey Lepikhov 2023-09-12 11:49:05 Re: Removing unneeded self joins