Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-15 11:51:14
Message-ID: CAJpy0uAT-oGsgmb4dC=dYYRNY7WXEAAWGWU2kL5phLz3Wcdifg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2023 at 7:56 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >> Yeah good point, agree to just error out in all the case then (if we discard the
> >> sync_ reserved wording proposal, which seems to be the case as probably not
> >> worth the extra work).
> >
> > Thanks for the discussion!
> >
> > Here is the V33 patch set which includes the following changes:
>
> Thanks for working on it!
>
> >
> > 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.
>
> @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
> * this only after failure, so when you promote, we still
> * finish replaying as much as we can from archive and
> * pg_wal before failover.
> + *
> + * Drop the slots for which sync is initiated but not yet
> + * completed i.e. they are still waiting for the primary
> + * server to catch up.
> */
> if (StandbyMode && CheckForStandbyTrigger())
> {
> XLogShutdownWalRcv();
> + slotsync_drop_initiated_slots();
> return XLREAD_FAIL;
> }
>
> I had a closer look and it seems this is not located at the right place.
>
> Indeed, it's added here:
>
> switch (currentSource)
> {
> case XLOG_FROM_ARCHIVE:
> case XLOG_FROM_PG_WAL:
>
> While in our case we are in
>
> case XLOG_FROM_STREAM:
>
> So I think we should move slotsync_drop_initiated_slots() in the
> XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
> (the TODO item number 2 you mentioned up-thread)
>
> BTW in order to prevent any corner case, would'nt also be better to
>
> replace:
>
> + /*
> + * Do not allow consumption of a "synchronized" slot until the standby
> + * gets promoted.
> + */
> + if (RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE))
>
> with something like:
>
> if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) || slot->data.sync_state == SYNCSLOT_STATE_INITIATED)
>
> to ensure slots in 'i' case can never be used?
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

PFA v34. It has changed patch002 from multi workers to single worker
design as per the discussion in [1] and [2].

Please note that the TODO list mentioned in [3] is still pending and
will be implemented in next version.

[1]: https://www.postgresql.org/message-id/CAA4eK1JzYoHu2r%3D%2BKwn%2BN4ZgVcWKtdX_yLSNyTqjdWGkr-q0iA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/e7b63103-2a8c-4ee9-866a-ddba45ead388%40gmail.com
[3]: https://www.postgresql.org/message-id/OS0PR01MB5716CE0729CEB3B5994A954194B3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

thanks
Shveta

Attachment Content-Type Size
v34-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 125.7 KB
v34-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 99.3 KB
v34-0003-Allow-slot-sync-worker-to-wait-for-the-cascading.patch application/octet-stream 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-11-15 12:14:44 Re: RFC: Pluggable TOAST
Previous Message Peter Eisentraut 2023-11-15 11:44:36 Re: Allow tests to pass in OpenSSL FIPS mode