Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Date: 2023-10-09 08:59:23
Message-ID: CALDaNm2RRpqfif3EqZ91uY-sPeOwEjYhAYxxuTyUodA2kHaDug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping. Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called during
> the silent mode. The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>
> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>
> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.

Few comments:
1) Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ Name plugin = PG_GETARG_NAME(1);
+
+ /* Temporary slots is never handled in this function */
+ bool two_phase = PG_GETARG_BOOL(2);

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+ Name slot_name;
+ XLogRecPtr end_of_wal;
+ LogicalDecodingContext *ctx = NULL;
+ bool has_record;
+
+ CHECK_IS_BINARY_UPGRADE;
+
+ /* Quick exit if the input is NULL */
+ if (PG_ARGISNULL(0))
+ PG_RETURN_BOOL(false);

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ Name plugin = PG_GETARG_NAME(1);
+
+ /* Temporary slots is never handled in this function */

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+ ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+ /* XXX: Is PG_TRY/CATCH needed around here? */
+
+ /*
+ * We use silent mode here to decode all changes without
outputting them,
+ * allowing us to detect all the records that could be sent downstream.
+ */

5) I felt these 2 comments can be combined as both are trying to say
the same thing:
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.

6) I feel this memset is not required as we are initializing at the
beginning of function, if you want to keep the memset, the
initialization can be removed:
+ values[2] = CStringGetTextDatum(xlogfilename);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);

7) looks like a typo, "mu" should be "must":
+ /*
+ * Also, we mu execute pg_log_standby_snapshot() when logical
replication
+ * slots are migrated. Because RUNNING_XACTS record is
required to create
+ * a consistent snapshot.
+ */
+ if (count_old_cluster_logical_slots())
+ create_consistent_snapshot();

8) consitent should be consistent:
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+ DbInfo *old_db = &old_cluster.dbarr.dbs[0];
+ PGconn *conn;
+
+ prep_status("Creating a consitent snapshot on new cluster");

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-10-09 09:20:14 Re: Clean up some pg_dump tests
Previous Message Heikki Linnakangas 2023-10-09 08:52:58 Re: Checks in RegisterBackgroundWorker.()