Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-20 03:27:10
Message-ID: CAJpy0uAb7j2ZNVnm_Mvt+ofCvK1Wh17-d-Jm5ZCq=6V0k327xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 18, 2023 at 4:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 17, 2023 at 2:01 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > FYI - the latest patch failed to apply.
> > >
> > > [postgres(at)CentOS7-x64 oss_postgres_misc]$ git apply
> > > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
> > > error: patch failed: src/include/utils/guc_hooks.h:160
> > > error: src/include/utils/guc_hooks.h: patch does not apply
> >
> > Rebased v24. PFA.
> >
>
> Few comments:
> ==============
> 1.
> + List of physical replication slots that logical replication
> with failover
> + enabled waits for.
>
> /logical replication/logical replication slots
>
> 2.
> If
> + <varname>enable_syncslot</varname> is not enabled on the
> + corresponding standbys, then it may result in indefinite waiting
> + on the primary for physical replication slots configured in
> + <varname>standby_slot_names</varname>
> + </para>
>
> Why the above leads to indefinite wait? I think we should just ignore
> standby_slot_names and probably LOG a message in the server for the
> same.
>
> 3.
> +++ b/src/backend/replication/logical/tablesync.c
> @@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
> */
> walrcv_create_slot(LogRepWorkerWalRcvConn,
> slotname, false /* permanent */ , false /* two_phase */ ,
> - CRS_USE_SNAPSHOT, origin_startpos);
> + false /* enable_failover */ , CRS_USE_SNAPSHOT,
> + origin_startpos);
>
> As per this code, we won't enable failover for tablesync slots. So,
> what happens if we need to failover to new node after the tablesync
> worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC?
> I think we won't be able to continue replication from failed over
> node. If this theory is correct, we have two options (a) enable
> failover for sync slots as well, if it is enabled for main slot; but
> then after we drop the slot on primary once sync is complete, same
> needs to be taken care at standby. (b) enable failover even for the
> main slot after all tables are in ready state, something similar to
> what we do for two_phase.
>
> 4.
> + /* Verify syntax */
> + if (!validate_slot_names(newval, &elemlist))
> + return false;
> +
> + /* Now verify if these really exist and have correct type */
> + if (!validate_standby_slots(elemlist))
>
> These two functions serve quite similar functionality which makes
> their naming quite confusing. Can we directly move the functionality
> of validate_slot_names() into validate_standby_slots()?
>
> 5.
> +SlotSyncInitConfig(void)
> +{
> + char *rawname;
> +
> + /* Free the old one */
> + list_free(standby_slot_names_list);
> + standby_slot_names_list = NIL;
> +
> + if (strcmp(standby_slot_names, "") != 0)
> + {
> + rawname = pstrdup(standby_slot_names);
> + SplitIdentifierString(rawname, ',', &standby_slot_names_list);
>
> How does this handle the case where '*' is specified for standby_slot_names?
>
>
> --
> With Regards,
> Amit Kapila.

PFA v25 patch set. The changes are:

1) 'enable_failover' is changed to 'failover'
2) Alter subscription changes to support 'failover'
3) Fixes a bug in patch001 wherein any change in standby_slot_names
was not considered in the flow where logical walsenders wait for
standby's confirmation. Now during the wait, if standby_slot_names is
changed, wait is restarted using new standby_slot_names.
4) Addresses comments by Bertrand and Amit in [1],[2],[3]

The changes are mostly in patch001 and a very few in patch002.

Thank You Ajin for working on alter-subscription changes and adding
more TAP-tests for 'failover'

[1]: https://www.postgresql.org/message-id/2742485f-4118-4fb4-9f94-8150de9e7d7e%40gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1JcBG6TJ3o5iUd4z0BuTbciLV3dK4aKgb7OgrNGoLcfSQ%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1J6BqO5%3DueFAQO%2BaYyHLaU-oCHrrVFJqHS-i0Ce9aPY2w%40mail.gmail.com

thanks
Shveta

Attachment Content-Type Size
v25-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 103.0 KB
v25-0002-Add-logical-slot-sync-capability-to-physical-sta.patch application/octet-stream 109.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-10-20 03:35:24 Re: Synchronizing slots from primary to standby
Previous Message vignesh C 2023-10-20 03:24:23 Re: [PoC] pg_upgrade: allow to upgrade publisher node