Re: promotion related handling in pg_sync_replication_slots()

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: promotion related handling in pg_sync_replication_slots()
Date: 2024-04-12 02:17:39
Message-ID: CAJpy0uA-_rfARiNCfpZ=xdvyet-FpqmDRiiav4n9C=_f0Mcf8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 6, 2024 at 11:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Please find v2. Changes are:
> > 1) Rebased the patch as there was conflict due to recent commit 6f132ed.
> > 2) Added an Assert in update_synced_slots_inactive_since() to ensure
> > that the slot does not have active_pid.
> > 3) Improved commit msg and comments.
> >
>
> Few comments:
> ==============
> 1.
> void
> SyncReplicationSlots(WalReceiverConn *wrconn)
> {
> + /*
> + * Startup process signaled the slot sync to stop, so if meanwhile user
> + * has invoked slot sync SQL function, simply return.
> + */
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + if (SlotSyncCtx->stopSignaled)
> + {
> + ereport(LOG,
> + errmsg("skipping slot synchronization as slot sync shutdown is
> signaled during promotion"));
> +
> + SpinLockRelease(&SlotSyncCtx->mutex);
> + return;
> + }
> + SpinLockRelease(&SlotSyncCtx->mutex);
>
> There is a race condition with this code. Say during promotion
> ShutDownSlotSync() is just before setting this flag and the user has
> invoked pg_sync_replication_slots() and passed this check but still
> didn't set the SlotSyncCtx->syncing flag. So, now, the promotion would
> recognize that there is slot sync going on in parallel, and slot sync
> wouldn't know that the promotion is in progress.

Did you mean that now, the promotion *would not* recognize...

I see, I will fix this.

> The current coding for slot sync worker ensures there is no such race
> by checking the stopSignaled and setting the pid together under
> spinlock. I think we need to move the setting of the syncing flag
> similarly. Once we do that probably checking SlotSyncCtx->syncing
> should be sufficient in ShutDownSlotSync(). If we change the location
> of setting the 'syncing' flag then please ensure its cleanup as we
> currently do in slotsync_failure_callback().

Sure, let me review.

> 2.
> @@ -1395,6 +1395,7 @@ update_synced_slots_inactive_since(void)
> if (s->in_use && s->data.synced)
> {
> Assert(SlotIsLogical(s));
> + Assert(s->active_pid == 0);
>
> We can add a comment like: "The slot must not be acquired by any process"
>
> --
> With Regards,
> Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-12 02:20:53 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Robert Haas 2024-04-12 02:15:27 Re: post-freeze damage control