Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-14 14:26:49
Message-ID: 46070646-9e09-4566-8a62-ae31a12a510c@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>> Yeah good point, agree to just error out in all the case then (if we discard the
>> sync_ reserved wording proposal, which seems to be the case as probably not
>> worth the extra work).
>
> Thanks for the discussion!
>
> Here is the V33 patch set which includes the following changes:

Thanks for working on it!

>
> 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.

@@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* this only after failure, so when you promote, we still
* finish replaying as much as we can from archive and
* pg_wal before failover.
+ *
+ * Drop the slots for which sync is initiated but not yet
+ * completed i.e. they are still waiting for the primary
+ * server to catch up.
*/
if (StandbyMode && CheckForStandbyTrigger())
{
XLogShutdownWalRcv();
+ slotsync_drop_initiated_slots();
return XLREAD_FAIL;
}

I had a closer look and it seems this is not located at the right place.

Indeed, it's added here:

switch (currentSource)
{
case XLOG_FROM_ARCHIVE:
case XLOG_FROM_PG_WAL:

While in our case we are in

case XLOG_FROM_STREAM:

So I think we should move slotsync_drop_initiated_slots() in the
XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
(the TODO item number 2 you mentioned up-thread)

BTW in order to prevent any corner case, would'nt also be better to

replace:

+ /*
+ * Do not allow consumption of a "synchronized" slot until the standby
+ * gets promoted.
+ */
+ if (RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE))

with something like:

if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) || slot->data.sync_state == SYNCSLOT_STATE_INITIATED)

to ensure slots in 'i' case can never be used?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-11-14 15:51:05 Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro
Previous Message Alvaro Herrera 2023-11-14 14:25:49 Re: Array initialisation notation in syscache.c