Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-11 06:53:05
Message-ID: CAA4eK1LHH_=wbxsEn20=W+qz1193OqFj-vvJ-u0uHLMmwLHbRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Thank you for reviewing! PSA new version! PSA new version.
>

Few comments:
==============
1.
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+ of all slots on the old cluster must be the same as the latest
+ checkpoint location.

We can add something like: "This ensures that all the data has been
replicated before the upgrade." to make it clear why this test is
important.

2. Move the wal_level related restriction before max_replication_slots.

3.
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
+ }
+
+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all
+ * the slots is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been
+ * shut down, so we skip this for live checks.
+ */
+ if (!live_check && !slot->caught_up)

Isn't it better to continue for the next slot once we find that slot
is invalid instead of checking other conditions?

4.
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);

Let's keep it as one string and change the message to: "The slot
"\"%s\" is invalid"

+ fprintf(script,
+ "slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+ slot->slotname);
+ }

On a similar line, we can change this to: "The slot "\"%s\" has not
consumed the WAL yet"

5.
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "problematic_logical_relication_slots.txt");

I think we can name this file as "invalid_logical_replication_slots"
or simply "logical_replication_slots"

6.
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ " %s", output_path);

This doesn't match the similar existing comments. So, let's change it
to something like:

"Your installation contains invalid logical replication slots. These
slots can't be copied so this cluster cannot currently be upgraded.
Consider either removing such slots or consuming the pending WAL if
any and then restart the upgrade. A list of invalid logical
replication slots is in the file:"

Apart from the above, I have edited a few other comments in the patch.
See attached.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
cosmetic_improvements_amit.1.patch application/octet-stream 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-09-11 07:04:22 Re: PSQL error: total cell count of XXX exceeded
Previous Message Hongxu Ma 2023-09-11 06:50:39 Re: PSQL error: total cell count of XXX exceeded