Re: Clear logical slot's 'synced' flag on promotion of standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Clear logical slot's 'synced' flag on promotion of standby
Date: 2025-09-11 03:59:59
Message-ID: CAJpy0uCw7Ux7J=361n8p0+DBJpWsyEK0uQjONHFBJVjUprkS7g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 10, 2025 at 5:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Sep 8, 2025 at 11:21 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > This is a spin-off thread from [1].
> >
> > Currently, in the slot-sync worker, we have an error scenario [2]
> > where, during slot synchronization, if we detect a slot with the same
> > name and its synced flag is set to false, we emit an error. The
> > rationale is to avoid potentially overwriting a user-created slot.
> >
> > But while analyzing [1], we observed that this error can lead to
> > inconsistent behavior during switchovers. On the first switchover, the
> > new standby logs an error: "Exiting from slot synchronization because
> > a slot with the same name already exists on the standby." But during
> > a double switchover, this error does not occur.
> >
> > Upon re-evaluating this, it seems more appropriate to clear the synced
> > flag after promotion, as the flag does not hold any meaning on the
> > primary. Doing so would ensure consistent behavior across all
> > switchovers, as the same error will be raised avoiding the risk of
> > overwriting user's slots.
>
> There is the following comment in FinishWalRecovery():
>
> /*
> * Shutdown the slot sync worker to drop any temporary slots acquired by
> * it and to prevent it from keep trying to fetch the failover slots.
> *
> * We do not update the 'synced' column in 'pg_replication_slots' system
> * view from true to false here, as any failed update could leave 'synced'
> * column false for some slots. This could cause issues during slot sync
> * after restarting the server as a standby. While updating the 'synced'
> * column after switching to the new timeline is an option, it does not
> * simplify the handling for the 'synced' column. Therefore, we retain the
> * 'synced' column as true after promotion as it may provide useful
> * information about the slot origin.
> */
> ShutDownSlotSync();
>
> Does the patch address the above concerns?
>

Yes, the patch is attempting to address the above concern. it is
trying to Reset synced-column after switching to a new timeline. There
is an issue though as pointed out by Ashutosh in [1], which needs to
be addressed.

[1]: https://www.postgresql.org/message-id/CAE9k0P%3DWXRHXLGxkegFLj9tVLrY45%2BuTtdgv%2BPjt1mqyit4zZw%40mail.gmail.com

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-09-11 04:34:54 Re: someone else to do the list of acknowledgments
Previous Message Amit Kapila 2025-09-11 03:50:30 Re: issue with synchronized_standby_slots