RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-12-18 12:11:22
Message-ID: OS0PR01MB571698173C36491E62703E039490A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, December 11, 2023 5:31 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Dec 7, 2023 at 1:33 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> >
> > Hi.
> >
> > Here are my review comments for patch v43-0002.
> >
>
> > ======
> > src/backend/access/transam/xlogrecovery.c
> >
> > 13.
> > + /*
> > + * Shutdown the slot sync workers to prevent potential conflicts between
> > + * user processes and slotsync workers after a promotion. Additionally,
> > + * drop any slots that have initiated but not yet completed the sync
> > + * process.
> > + */
> > + ShutDownSlotSync();
> > + slotsync_drop_initiated_slots();
> > +
> >
> > Is this where maybe the 'sync_state' should also be updated for
> > everything so you are not left with confusion about different states
> > on a node that is no longer a standby node?
> >
>
> yes, this is the place. But this needs more thought as it may cause
> too much disk activity during promotion. so let me analyze and come
> back.

Per off-list discussion with Amit.

I think it's fine to keep both READY and NONE on a primary. Because even if we
update the sync_state from READY to NONE on promotion, it doesn't reduce the
complexity for the handling of READY and NONE state. And it's not
straightforward to choose the right place to update sync_state, here is the
analysis:

(related steps on promotion)
1 (patch) shutdown slotsync worker
2 (patch) drop 'i' state slots.
3 remove standby.signal and recovery.signal
4 switch to a new timeline and write the timeline history file
5 set SharedRecoveryState = RECOVERY_STATE_DONE which means RecoveryInProgress() will return false.

We could not update the sync_state before step 3 because if the update fails after
updating some of slots' state, then the server will be shutdown leaving some
'NONE' state slots. After restarting, the server is still a standby so the slot
sync worker will fail to sync these 'NONE' state slots.

We also could not update it after step 3 and before step 4. Because if any ERROR
when updating, then after restarting the server, although the server will
become a master(as standby.signal is removed), but it can still be made as a
active standby by creating a standby.signal file because the timeline has not
been switched. And in this case, the slot sync worker will also fail to sync
these 'NONE' state slots.

Updating the sync_state after step 4 and before step 5 is OK, but still
It doesn't simplify the handling for both READY and NONE state slots.
Therefore, I think we can retain the READY state slots after promotion as they
can provide information about the slot's origin. I added some comments around
slotsync cleanup codes (in FinishWalRecovery) to mentioned the reason.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-12-18 12:12:26 RE: Synchronizing slots from primary to standby
Previous Message Euler Taveira 2023-12-18 11:58:14 Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation