From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date: | 2025-08-05 03:58:27 |
Message-ID: | CAJpy0uAoyk+ASXpbOa8exk60XutKyHtuBXhRo0ccRO3svTSO_Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > 5)
> > > > I tried a test where there were 4 slots on the publisher, where one
> > > > was getting used while the others were not. Initiated
> > > > pg_sync_replication_slots on standby. Forced unused slots to be
> > > > invalidated by setting idle_replication_slot_timeout=60 on primary,
> > > > due to which API finished but gave a warning:
> > > >
> > > > postgres=# SELECT pg_sync_replication_slots();
> > > > WARNING: aborting initial sync for slot "failover_slot"
> > > > DETAIL: This slot was invalidated on the primary server.
> > > > WARNING: aborting initial sync for slot "failover_slot2"
> > > > DETAIL: This slot was invalidated on the primary server.
> > > > WARNING: aborting initial sync for slot "failover_slot3"
> > > > DETAIL: This slot was invalidated on the primary server.
> > > > pg_sync_replication_slots
> > > > ---------------------------
> > > >
> > > > (1 row)
> > > >
> > > > Do we need these warnings here? I think we can have it as a LOG rather
> > > > than having it on console. Thoughts?
> > > >
> > >
> > > What is the behaviour of a slotsync worker in the same case? I don't
> > > see any such WARNING messages in the code of slotsync worker, so why
> > > do we want a different behaviour here?
> > >
> >
> > We don’t have continuous waiting in the slot-sync worker if the remote
> > slot is behind the local slot. But if during the first sync cycle the
> > remote slot is behind, we keep the local slot as a temporary slot. In
> > the next sync cycle, if we find the remote slot is invalidated, we
> > mark the local slot as invalidated too, keeping it in this temporary
> > state. There are no LOG or WARNING messages in this case. When the
> > slot-sync worker stops or shuts down (like during promotion), it
> > cleans up this temporary slot.
> >
> > Now, for the API behavior: if the remote slot is behind the local
> > slot, the API enters a wait loop and logs:
> >
> > LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and
> > catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin
> > (770)
> >
> > If it keeps waiting, every 10 seconds it logs:
> > LOG: continuing to wait for remote slot "failover_slot" LSN
> > (0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
> > and catalog xmin (770)
> >
> > If the remote slot becomes invalidated during this wait, currently it
> > logs a WARNING and moves to syncing the next slot:
> > WARNING: aborting initial sync for slot "failover_slot" as the slot
> > was invalidated on primary
> >
> > I think this WARNING is too strong. We could change it to a LOG
> > message instead, mark the local slot as invalidated, exit the wait
> > loop, and move on to syncing the next slot.
> >
> > Even though this LOG is not there in slotsync worker case, I think it
> > makes more sense in API case due to continuous LOGs which suggested
> > that API was waiting to sync this slot in wait-loop and thus we shall
> > conclude it either by saying wait-over (like we do in successful sync
> > case) or we can say 'LOG: aborting wait as the remote slot was
> > invalidated' instead of above WARNING message. What do you suggest?
> >
>
> I also think LOG is a better choice for this because there is nothing
> we can expect users to do even after seeing this. I feel this is more
> of an info for users.
>
Yes, it is more of an info for users.
> 1.
> update_and_persist_local_synced_slot()
> {
> ...
> + /*
> + * For SQL API synchronization, we wait for the remote slot to catch up
> + * here, since we can't assume the SQL API will be called again soon.
> + * We will retry the sync once the slot catches up.
> + *
> + * Note: This will return false if a promotion is triggered on the
> + * standby while waiting, in which case we stop syncing and drop the
> + * temporary slot.
> + */
> + if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
> + return false;
>
> Why is the wait added at this level? Shouldn't it be at API level aka
> in SyncReplicationSlots() or pg_sync_replication_slots() similar to
> what we do in ReplSlotSyncWorkerMain() for slotsync workers?
The initial goal was to perform a single sync cycle for all slots. The
logic was simple, if any slot couldn’t be synced because its remote
slot was lagging, we would wait for the remote slot to catch up, and
only then move on to the next slot.
But if we consider moving wait logic to SyncReplicationSlots(), we
will necessarily have to go with the logic that it will attempt to
sync all slots in the first sync cycle, skipping those where remote
slots are lagging. It will then continue with multiple sync cycles
until all slots are successfully synced. But if new remote slots are
added in the meantime, they will be picked up in the next cycle, and
the API then has to wait on those as well and this cycle may go on for
longer.
If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.
IMO, attempting multiple sync cycles essentially makes the API behave
more like slotsyncworker, which might not be desirable. I feel that
performing just one sync cycle in the API is more in line with the
expected behavior. And for that, the current implementation of
wait-logic seems simpler. But let me know if you think otherwise or I
have not understood your point clearly. I am open to more
approaches/ideas here.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Corey Huinker | 2025-08-05 04:10:33 | Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions |
Previous Message | Sami Imseih | 2025-08-05 03:47:45 | Re: Improve LWLock tranche name visibility across backends |