Re: Clear logical slot's 'synced' flag on promotion of standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Clear logical slot's 'synced' flag on promotion of standby
Date: 2025-09-19 09:26:13
Message-ID: CAJpy0uD0Du58r6DiKa_u-=vbFAL=uKNK5vr1E4tvC6fcnx=yPw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 18, 2025 at 4:16 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Sep 12, 2025 at 1:56 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > The approach seems valid and should work, but introducing a new file
> > like promote.inprogress for this purpose might be excessive. We can
> > first try analyzing existing information to determine whether we can
> > distinguish between the two scenarios -- a primary in crash recovery
> > immediately after a promotion attempt versus a regular primary. If we
> > are unable to find any way, we can revisit the idea.
> >
>
> I needed a way to reset slots not only during promotion, but also
> after a crash that occurs while slots are being reset, so there would
> be a fallback mechanism to clear them again on startup. As Shveta
> pointed out, it wasn’t trivial to tell apart a standby restarting
> after crashing during promotion from a primary restarting after a
> crash. So I decided to just reset slots every time primary (or a
> standby after promotion) restarts.
>
> Because this fallback logic will run on every primary restart, it was
> important to minimize overhead added by the patch. After some
> discussion, I placed the reset logic in RestoreSlotFromDisk(), which
> is invoked by StartupReplicationSlots() whenever the server starts.
> Because RestoreSlotFromDisk() already loops through all slots, this
> adds minimum extra work; but also ensures the synced flag is cleared
> when running on a primary.

+1 for the idea. I would like to know what others think here.

> The next challenge was finding a reliable flag to distinguish
> primaries from standbys, since we really don’t want to reset the flag
> on a standby. I tested StandbyMode, RecoveryInProgress(), and
> InRecovery. But during restarts, both RecoveryInProgress() and
> InRecovery are always true on both primary and standby. In all my
> testing, StandbyMode was the only variable that consistently
> differentiated between the two, which is what I used.
>
> I have also changed the documentation and comments regarding 'synced'
> flags not being reset on the primary.
>

Please find a few comments:

1)
+ * Reset all replication slots that have synced=true to synced=false.

Can we please change it to:
Reset the synced flag to false for all replication slots where it is
currently true.

2)
I was wondering that since we reset the sync flag everytime we load
slots from disk , then do we even need ResetSyncedSlots() during
promotion? But I guess we still need it because even after promotion
(if not restarted), existing backend sessions stay alive and it makes
sense if they too see 'synced' as false after promotion. Is it worth
adding this in comments atop ResetSyncedSlots() call during promotion?

3)
+ if (!StandbyMode)
+ slot->data.synced = false;

a)
Do we need to mark the slot as dirty so it gets saved to disk on the
next chance?

I think ReplicationSlotSave can be skipped, as it may not be
appropriate in the restore flow. But marking the slot dirty is
important to avoid resetting the sync flag again on the next startup.
A crash between marking it dirty and persisting it would still require
a reset, but that seems acceptable. Thoughts?

b)
Also if we are marking it dirty, it makes sense to set synced to false
only after checking if synced is true already.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-09-19 09:33:59 Re: Clear logical slot's 'synced' flag on promotion of standby
Previous Message wenhui qiu 2025-09-19 09:07:15 Re: POC: make mxidoff 64 bits