Re: promotion related handling in pg_sync_replication_slots()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: promotion related handling in pg_sync_replication_slots()
Date: 2024-04-06 06:19:19
Message-ID: CAA4eK1LiZdF9UAL0KTOCKt3rw1sU2cRrV+nwfjipbS2WZQwOCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Please find v2. Changes are:
> 1) Rebased the patch as there was conflict due to recent commit 6f132ed.
> 2) Added an Assert in update_synced_slots_inactive_since() to ensure
> that the slot does not have active_pid.
> 3) Improved commit msg and comments.
>

Few comments:
==============
1.
void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
+ /*
+ * Startup process signaled the slot sync to stop, so if meanwhile user
+ * has invoked slot sync SQL function, simply return.
+ */
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+ if (SlotSyncCtx->stopSignaled)
+ {
+ ereport(LOG,
+ errmsg("skipping slot synchronization as slot sync shutdown is
signaled during promotion"));
+
+ SpinLockRelease(&SlotSyncCtx->mutex);
+ return;
+ }
+ SpinLockRelease(&SlotSyncCtx->mutex);

There is a race condition with this code. Say during promotion
ShutDownSlotSync() is just before setting this flag and the user has
invoked pg_sync_replication_slots() and passed this check but still
didn't set the SlotSyncCtx->syncing flag. So, now, the promotion would
recognize that there is slot sync going on in parallel, and slot sync
wouldn't know that the promotion is in progress.

The current coding for slot sync worker ensures there is no such race
by checking the stopSignaled and setting the pid together under
spinlock. I think we need to move the setting of the syncing flag
similarly. Once we do that probably checking SlotSyncCtx->syncing
should be sufficient in ShutDownSlotSync(). If we change the location
of setting the 'syncing' flag then please ensure its cleanup as we
currently do in slotsync_failure_callback().

2.
@@ -1395,6 +1395,7 @@ update_synced_slots_inactive_since(void)
if (s->in_use && s->data.synced)
{
Assert(SlotIsLogical(s));
+ Assert(s->active_pid == 0);

We can add a comment like: "The slot must not be acquired by any process"

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-04-06 06:25:38 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Amit Langote 2024-04-06 06:03:13 Re: remaining sql/json patches