Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-01 09:39:01
Message-ID: CAA4eK1LWKkoyy-p-SAT0JTWa=6kXiMd=a6ZcArY9eU4a3g4TZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 28, 2023 at 5:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> Here is a patch which checks that there are no WAL records other than
> CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
> from [1].
>

Few comments:
=============
1. Do we really need 0001 patch after the latest change proposed by
Vignesh in the 0004 patch?

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");

Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?

3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");

I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-08-01 10:25:10 RE: Fix compilation warnings when CFLAGS -Og is specified
Previous Message José Neves 2023-08-01 09:13:47 RE: CDC/ETL system on top of logical replication with pgoutput, custom client