Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-18 10:54:10
Message-ID: CAA4eK1J6BqO5=ueFAQO+aYyHLaU-oCHrrVFJqHS-i0Ce9aPY2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 17, 2023 at 2:01 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > FYI - the latest patch failed to apply.
> >
> > [postgres(at)CentOS7-x64 oss_postgres_misc]$ git apply
> > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
> > error: patch failed: src/include/utils/guc_hooks.h:160
> > error: src/include/utils/guc_hooks.h: patch does not apply
>
> Rebased v24. PFA.
>

Few comments:
==============
1.
+ List of physical replication slots that logical replication
with failover
+ enabled waits for.

/logical replication/logical replication slots

2.
If
+ <varname>enable_syncslot</varname> is not enabled on the
+ corresponding standbys, then it may result in indefinite waiting
+ on the primary for physical replication slots configured in
+ <varname>standby_slot_names</varname>
+ </para>

Why the above leads to indefinite wait? I think we should just ignore
standby_slot_names and probably LOG a message in the server for the
same.

3.
+++ b/src/backend/replication/logical/tablesync.c
@@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
*/
walrcv_create_slot(LogRepWorkerWalRcvConn,
slotname, false /* permanent */ , false /* two_phase */ ,
- CRS_USE_SNAPSHOT, origin_startpos);
+ false /* enable_failover */ , CRS_USE_SNAPSHOT,
+ origin_startpos);

As per this code, we won't enable failover for tablesync slots. So,
what happens if we need to failover to new node after the tablesync
worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC?
I think we won't be able to continue replication from failed over
node. If this theory is correct, we have two options (a) enable
failover for sync slots as well, if it is enabled for main slot; but
then after we drop the slot on primary once sync is complete, same
needs to be taken care at standby. (b) enable failover even for the
main slot after all tables are in ready state, something similar to
what we do for two_phase.

4.
+ /* Verify syntax */
+ if (!validate_slot_names(newval, &elemlist))
+ return false;
+
+ /* Now verify if these really exist and have correct type */
+ if (!validate_standby_slots(elemlist))

These two functions serve quite similar functionality which makes
their naming quite confusing. Can we directly move the functionality
of validate_slot_names() into validate_standby_slots()?

5.
+SlotSyncInitConfig(void)
+{
+ char *rawname;
+
+ /* Free the old one */
+ list_free(standby_slot_names_list);
+ standby_slot_names_list = NIL;
+
+ if (strcmp(standby_slot_names, "") != 0)
+ {
+ rawname = pstrdup(standby_slot_names);
+ SplitIdentifierString(rawname, ',', &standby_slot_names_list);

How does this handle the case where '*' is specified for standby_slot_names?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-10-18 10:56:39 Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Previous Message Ashutosh Bapat 2023-10-18 10:47:40 Re: BRIN minmax multi - incorrect distance for infinite timestamp/date