From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Clear logical slot's 'synced' flag on promotion of standby |
Date: | 2025-09-11 03:47:06 |
Message-ID: | CAJpy0uCqDM_AX3mL38PotB4M2ahoPYCfYeH3pT0kbYXsQ9ga4w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 9, 2025 at 2:19 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> + * required resources. Clear any leftover 'synced' flags on replication
> + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY
> + * state check ensures that this code is only reached when a standby
> + * server crashes during promotion.
> */
> StartupReplicationSlots();
> + if (ControlFile->state == DB_IN_CRASH_RECOVERY)
>
> I believe the primary server can also enter the DB_IN_CRASH_RECOVERY
> state. For example, if the primary is already in crash recovery and
> crashes again while in crash recovery, it will restart in the
> DB_IN_CRASH_RECOVERY state, no?
>
Yes, good point. I think we can differentiate the two cases based on
the timeline change. A regular primary won't have a timeline change,
whereas a promoted standby that failed during promotion will show a
timeline change immediately upon restart. Thoughts?
In the worst-case scenario, even if we end up running the Reset
function during a regular primary's crash recovery, it shouldn't cause
any harm. (That said, I'm not suggesting we shouldn't fix it). What
concerns me more is the possibility of running it on a regular
standby, as it could disrupt slot synchronization. I attempted to
simulate a scenario where a regular standby ends up in
DB_IN_CRASH_RECOVERY after a crash, but I couldn't reproduce it. Do
you know of any situation where this could happen? The absence of
comments for these states makes it challenging to follow the flow.
> --
>
> With this change are we saying that on primary the synced flag must be
> always false. Because the postgres doc on pg_replication_slots says:
>
> "The value of this column has no meaning on the primary server; the
> column value on the primary is default false for all slots but may (if
> leftover from a promoted standby) also be true."
>
The doc needs change.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-09-11 03:50:30 | Re: issue with synchronized_standby_slots |
Previous Message | Amit Kapila | 2025-09-11 03:32:39 | Re: issue with synchronized_standby_slots |