Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-26 11:52:20
Message-ID: CAA4eK1LJdmGATWG=xOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Attach the V98 patch set which addressed above comments.
>

Few comments:
=============
1.
WalSndWaitForWal(XLogRecPtr loc)
{
int wakeEvents;
+ bool wait_for_standby = false;
+ uint32 wait_event;
+ List *standby_slots = NIL;
static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;

+ if (MyReplicationSlot->data.failover && replication_active)
+ standby_slots = GetStandbySlotList(true);
+
/*
- * Fast path to avoid acquiring the spinlock in case we already know we
- * have enough WAL available. This is particularly interesting if we're
- * far behind.
+ * Check if all the standby servers have confirmed receipt of WAL up to
+ * RecentFlushPtr even when we already know we have enough WAL available.
+ *
+ * Note that we cannot directly return without checking the status of
+ * standby servers because the standby_slot_names may have changed, which
+ * means there could be new standby slots in the list that have not yet
+ * caught up to the RecentFlushPtr.
*/
- if (RecentFlushPtr != InvalidXLogRecPtr &&
- loc <= RecentFlushPtr)
- return RecentFlushPtr;
+ if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr)
+ {
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);

I think even if the slot list is not changed, we will always process
each slot mentioned in standby_slot_names once. Can't we cache the
previous list of slots for we have already waited for? In that case,
we won't even need to copy the list via GetStandbySlotList() unless we
need to wait.

2.
WalSndWaitForWal(XLogRecPtr loc)
{
+ /*
+ * Update the standby slots that have not yet caught up to the flushed
+ * position. It is good to wait up to RecentFlushPtr and then let it
+ * send the changes to logical subscribers one by one which are
+ * already covered in RecentFlushPtr without needing to wait on every
+ * change for standby confirmation.
+ */
+ if (wait_for_standby)
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);
+
/* Update our idea of the currently flushed position. */
- if (!RecoveryInProgress())
+ else if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr(NULL);
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
...
/*
* If postmaster asked us to stop, don't wait anymore.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (got_STOPPING)
break;

I think because 'wait_for_standby' may not be set in the first or
consecutive cycles we may send the WAL to the logical subscriber
before sending it to the physical subscriber during shutdown.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-02-26 12:01:45 Re: RFC: Logging plan of the running query
Previous Message Amit Kapila 2024-02-26 11:48:25 Re: Synchronizing slots from primary to standby