Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-29 11:17:29
Message-ID: CAA4eK1KC9tb0AUKKnkqyuXXc-B89KQDk5NEWxhm15-dPG2pt3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Few additional comments on the latest patch:
> =================================
> 1.
> static XLogRecPtr
> WalSndWaitForWal(XLogRecPtr loc)
> {
> ...
> + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
> + (!replication_active || StandbyConfirmedFlush(loc, WARNING)))
> + {
> + /*
> + * Fast path to avoid acquiring the spinlock in case we already know
> + * we have enough WAL available and all the standby servers have
> + * confirmed receipt of WAL up to RecentFlushPtr. This is particularly
> + * interesting if we're far behind.
> + */
> return RecentFlushPtr;
> + }
> ...
> ...
> + * Wait for WALs to be flushed to disk only if the postmaster has not
> + * asked us to stop.
> + */
> + if (loc > RecentFlushPtr && !got_STOPPING)
> + wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> +
> + /*
> + * Check if the standby slots have caught up to the flushed position.
> + * It is good to wait up to RecentFlushPtr and then let it send the
> + * changes to logical subscribers one by one which are already covered
> + * in RecentFlushPtr without needing to wait on every change for
> + * standby confirmation. Note that after receiving the shutdown signal,
> + * an ERROR is reported if any slots are dropped, invalidated, or
> + * inactive. This measure is taken to prevent the walsender from
> + * waiting indefinitely.
> + */
> + else if (replication_active &&
> + !StandbyConfirmedFlush(RecentFlushPtr, WARNING))
> + {
> + wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> + wait_for_standby = true;
> + }
> + else
> + {
> + /* Already caught up and doesn't need to wait for standby_slots. */
> break;
> + }
> ...
> }
>
> Can we try to move these checks into a separate function that returns
> a boolean and has an out parameter as wait_event?
>
> 2. How about naming StandbyConfirmedFlush() as StandbySlotsAreCaughtup?
>

Some more comments:
==================
1.
+ foreach_ptr(char, name, elemlist)
+ {
+ ReplicationSlot *slot;
+
+ slot = SearchNamedReplicationSlot(name, true);
+
+ if (!slot)
+ {
+ GUC_check_errdetail("replication slot \"%s\" does not exist",
+ name);
+ ok = false;
+ break;
+ }
+
+ if (!SlotIsPhysical(slot))
+ {
+ GUC_check_errdetail("\"%s\" is not a physical replication slot",
+ name);
+ ok = false;
+ break;
+ }
+ }

Won't the second check need protection via ReplicationSlotControlLock?

2. In WaitForStandbyConfirmation(), we are anyway calling
StandbyConfirmedFlush() before the actual sleep in the loop, so does
calling it at the beginning of the function will serve any purpose? If
so, it is better to add some comments explaining the same.

3. Also do we need to perform the validation checks done in
StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We
can probably separate those checks and perform them just once.

4.
+   *
+   * XXX: If needed, this can be attempted in future.

Remove this part of the comment.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-02-29 11:35:50 Re: Synchronizing slots from primary to standby
Previous Message Alvaro Herrera 2024-02-29 10:46:42 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock