Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-17 11:38:14
Message-ID: 1e0b2eb4-c977-482d-b16e-c52711c34d6c@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/16/23 1:03 PM, shveta malik wrote:
> On Thu, Nov 16, 2023 at 3:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> PFA v35. It has below changes:

Thanks for the update!

> 6) shutdown the slotsync worker on promotion.

+ /*
+ * Shutdown the slot sync workers to prevent potential conflicts between
+ * user processes and slotsync workers after a promotion. Additionally,
+ * drop any slots that have initiated but not yet completed the sync
+ * process.
+ */
+ ShutDownSlotSync();
+ slotsync_drop_initiated_slots();

I think there is a corner case here.

If there is promotion while slot creation is in progress (slot has just
been created and is in 'i' state), then when we shutdown the sync slot worker
in ShutDownSlotSync() we'll set slot->in_use = false in ReplicationSlotDropPtr().

Indeed, when we shut the sync worker down:

(gdb) bt
#0 ReplicationSlotDropPtr (slot=0x7f25af5c9bb0) at slot.c:734
#1 0x000056266c8106a7 in ReplicationSlotDropAcquired () at slot.c:725
#2 0x000056266c810170 in ReplicationSlotRelease () at slot.c:583
#3 0x000056266c80f420 in ReplicationSlotShmemExit (code=1, arg=0) at slot.c:189
#4 0x000056266c86213b in shmem_exit (code=1) at ipc.c:243
#5 0x000056266c861fdf in proc_exit_prepare (code=1) at ipc.c:198
#6 0x000056266c861f23 in proc_exit (code=1) at ipc.c:111

So later on, when we'll want to drop this slot in slotsync_drop_initiated_slots()
we'll get things like:

2023-11-17 11:22:08.526 UTC [2195486] FATAL: replication slot "logical_slot4" does not exist

Reason is that slotsync_drop_initiated_slots() does call SearchNamedReplicationSlot():

(gdb) bt
#0 SearchNamedReplicationSlot (name=0x7f743f5c9ab8 "logical_slot4", need_lock=false) at slot.c:388
#1 0x0000556ef0974ec1 in ReplicationSlotAcquire (name=0x7f743f5c9ab8 "logical_slot4", nowait=true) at slot.c:484
#2 0x0000556ef09754e7 in ReplicationSlotDrop (name=0x7f743f5c9ab8 "logical_slot4", nowait=true, user_cmd=false) at slot.c:668
#3 0x0000556ef095f0a3 in slotsync_drop_initiated_slots () at slotsync.c:369

that returns a NULL slot if slot->in_use = false.

One option could be to make sure slot->in_use = true before calling ReplicationSlotDrop() here?

+ foreach(lc, slots)
+ {
+ ReplicationSlot *s = (ReplicationSlot *) lfirst(lc);
+
+ ReplicationSlotDrop(NameStr(s->data.name), true, false);

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-11-17 11:38:36 Re: Synchronizing slots from primary to standby
Previous Message Dilip Kumar 2023-11-17 11:11:24 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock