Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-08-04 06:49:40
Message-ID: CAJpy0uAFSyORpSs99aTBHJ+kEy+4hsjfQAJYHmGy6i+sCB7Now@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > 5)
> > I tried a test where there were 4 slots on the publisher, where one
> > was getting used while the others were not. Initiated
> > pg_sync_replication_slots on standby. Forced unused slots to be
> > invalidated by setting idle_replication_slot_timeout=60 on primary,
> > due to which API finished but gave a warning:
> >
> > postgres=# SELECT pg_sync_replication_slots();
> > WARNING: aborting initial sync for slot "failover_slot"
> > DETAIL: This slot was invalidated on the primary server.
> > WARNING: aborting initial sync for slot "failover_slot2"
> > DETAIL: This slot was invalidated on the primary server.
> > WARNING: aborting initial sync for slot "failover_slot3"
> > DETAIL: This slot was invalidated on the primary server.
> > pg_sync_replication_slots
> > ---------------------------
> >
> > (1 row)
> >
> > Do we need these warnings here? I think we can have it as a LOG rather
> > than having it on console. Thoughts?
> >
>
> What is the behaviour of a slotsync worker in the same case? I don't
> see any such WARNING messages in the code of slotsync worker, so why
> do we want a different behaviour here?
>

We don’t have continuous waiting in the slot-sync worker if the remote
slot is behind the local slot. But if during the first sync cycle the
remote slot is behind, we keep the local slot as a temporary slot. In
the next sync cycle, if we find the remote slot is invalidated, we
mark the local slot as invalidated too, keeping it in this temporary
state. There are no LOG or WARNING messages in this case. When the
slot-sync worker stops or shuts down (like during promotion), it
cleans up this temporary slot.

Now, for the API behavior: if the remote slot is behind the local
slot, the API enters a wait loop and logs:

LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and
catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin
(770)

If it keeps waiting, every 10 seconds it logs:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (770)

If the remote slot becomes invalidated during this wait, currently it
logs a WARNING and moves to syncing the next slot:
WARNING: aborting initial sync for slot "failover_slot" as the slot
was invalidated on primary

I think this WARNING is too strong. We could change it to a LOG
message instead, mark the local slot as invalidated, exit the wait
loop, and move on to syncing the next slot.

Even though this LOG is not there in slotsync worker case, I think it
makes more sense in API case due to continuous LOGs which suggested
that API was waiting to sync this slot in wait-loop and thus we shall
conclude it either by saying wait-over (like we do in successful sync
case) or we can say 'LOG: aborting wait as the remote slot was
invalidated' instead of above WARNING message. What do you suggest?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-08-04 07:14:17 Re: Improve prep_buildtree
Previous Message David G. Johnston 2025-08-04 06:37:58 Re: CREATE PUBLICATION with 'publish_generated_columns' parameter specified but unassigned