Re: failover logical replication slots

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: failover logical replication slots
Date: 2025-07-11 15:12:02
Message-ID: CAA5-nLC2__W71QmQtZ37Cm0-6jf5ZJUkjbb2QqrR1HYTNB3M=g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,
Here is a proposed solution to handle the problem of creating the logical
replication slot on standby after a switchover.
Thank you for your comments and help on this issue

Regards

Fabrice

diff --git a/src/backend/replication/logical/slotsync.c
b/src/backend/replication/logical/slotsync.c
index 656e66e..296840a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
ReplicationSlot *slot;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
+ bool overwriting_failover_slot = true; /* could be a GUC
*/

/*
* Make sure that concerned WAL is received and flushed before
syncing
@@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
+ bool failover_status = remote_slot->failover;;

SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);

- /* User-created slot with the same name exists, raise
ERROR. */
- if (!synced)
- ereport(ERROR,
+ if (!synced){
+ /*
+ * Check if we need to overwrite an existing
failover slot and
+ * if slot has the failover flag set to true
+ * and the sync_replication_slots is on,
+ * other check could be added here */
+ if (overwriting_failover_slot && failover_status &&
sync_replication_slots){
+
+ /* Get rid of a replication slot that is no
longer wanted */
+ ReplicationSlotDrop(remote_slot->name,
true);
+ ereport(WARNING,
+
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it
will be dropped because overwriting_failover_slot is set to true",
+ remote_slot->name));
+ return false; /* Going back to the main
loop after droping the failover slot */
+ }
+ /* User-created slot with the same name exists,
raise ERROR. */
+ else
+ ereport(ERROR,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot
synchronization because same"
" name slot \"%s\"
already exists on the standby",
remote_slot->name));
-
+ }
/*
* The slot has been synchronized before.
*

On Thu, Jun 12, 2025 at 4:27 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
wrote:

> yes of course, maybe for PG 19
>
> Regards,
> Fabrice
>
> On Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
>> On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
>> wrote:
>> >
>> > However, the problem still persists: it is currently not possible to
>> perform an automatic switchover after creating a new subscription.
>> >
>> > Would it be reasonable to consider adding a GUC to address this issue?
>> > I can propose a patch in that sense if it seems appropriate.
>> >
>>
>> Yeah, we can consider that, though I don't know at this stage if GUC
>> is the only way, but I hope you understand that it will be for PG19.
>>
>> --
>> With Regards,
>> Amit Kapila.
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-07-11 15:27:29 Re: What is a typical precision of gettimeofday()?
Previous Message Dilip Kumar 2025-07-11 15:06:42 Re: CHECKPOINT unlogged data