Re: Clear logical slot's 'synced' flag on promotion of standby

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clear logical slot's 'synced' flag on promotion of standby
Date: 2025-09-11 13:59:44
Message-ID: CAE9k0P=ODwH5aB-skBgffvDS010Jo1h=wGpLpE0aCqnqfx2+xg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 11, 2025 at 9:17 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Sep 9, 2025 at 2:19 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> >
> > + * required resources. Clear any leftover 'synced' flags on replication
> > + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY
> > + * state check ensures that this code is only reached when a standby
> > + * server crashes during promotion.
> > */
> > StartupReplicationSlots();
> > + if (ControlFile->state == DB_IN_CRASH_RECOVERY)
> >
> > I believe the primary server can also enter the DB_IN_CRASH_RECOVERY
> > state. For example, if the primary is already in crash recovery and
> > crashes again while in crash recovery, it will restart in the
> > DB_IN_CRASH_RECOVERY state, no?
> >
>
> Yes, good point. I think we can differentiate the two cases based on
> the timeline change. A regular primary won't have a timeline change,
> whereas a promoted standby that failed during promotion will show a
> timeline change immediately upon restart. Thoughts?
>

Will there be any issues if we clear the sync status immediately after
the standby.signal file is removed from the standby server?

We could maybe introduce a temporary "promote.inprogress" marker file
on disk before removing standby.signal. The sequence would be:

1) Create promote.inprogress.
2) Unlink standby.signal
3) Clear the sync slot status.
4) Remove promote.inprogress.

This way, if the server crashes after standby.signal is removed but
before the sync status is cleared, the presence of promote.inprogress
would indicate that the standby was in the middle of promotion and
crashed before slot cleanup. On restart, we could use that marker to
detect the incomplete promotion and finish clearing the sync flags.

If the crash happens at a later stage, the server will no longer start
as a standby anyway, and by then the sync flags would already have
been reset.

This is just a thought and it may sound a bit naive. Let me know if I
am overlooking something.

--
With Regards,
Ashutosh Sharma.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-09-11 14:11:43 Re: Only one version can be installed when using extension_control_path
Previous Message Kouber Saparev 2025-09-11 13:35:01 Re: BF mamba failure