Re: promotion related handling in pg_sync_replication_slots()

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: promotion related handling in pg_sync_replication_slots()
Date: 2024-04-12 02:27:39
Message-ID: CAJpy0uD-bRDc+rdf5znj7gp40nTq=m-1QE4U6z029tYV+C4xPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Please find v2. Changes are:
>
> Thanks for the patch. Here are some comments.

Thanks for reviewing.
>
> 1. Can we have a clear saying in the shmem variable who's syncing at
> the moment? Is it a slot sync worker or a backend via SQL function?
> Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
>
> typedef enum SlotSyncSource
> {
> SLOT_SYNC_NONE,
> SLOT_SYNC_WORKER,
> SLOT_SYNC_BACKEND,
> } SlotSyncSource;
>
> Then, the check in ShutDownSlotSync can be:
>
> + /*
> + * Return if neither the slot sync worker is running nor the function
> + * pg_sync_replication_slots() is executing.
> + */
> + if ((SlotSyncCtx->pid == InvalidPid) &&
> SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> {
>
> 2.
> SyncReplicationSlots(WalReceiverConn *wrconn)
> {
> + /*
> + * Startup process signaled the slot sync to stop, so if meanwhile user
> + * has invoked slot sync SQL function, simply return.
> + */
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + if (SlotSyncCtx->stopSignaled)
> + {
> + ereport(LOG,
> + errmsg("skipping slot synchronization as slot sync
> shutdown is signaled during promotion"));
> +
>
> Unless I'm missing something, I think this can't detect if the backend
> via SQL function is already half-way through syncing in
> synchronize_one_slot. So, better move this check to (or also have it
> there) slot sync loop that calls synchronize_one_slot. To avoid
> spinlock acquisitions, we can perhaps do this check in when we acquire
> the spinlock for synced flag.

If the sync via SQL function is already half-way, then promotion
should wait for it to finish. I don't think it is a good idea to move
the check to synchronize_one_slot(). The sync-call should either not
start (if it noticed the promotion) or finish the sync and then let
promotion proceed. But I would like to know others' opinion on this.

>
> /* Search for the named slot */
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> {
> bool synced;
>
> SpinLockAcquire(&slot->mutex);
> synced = slot->data.synced;
> << get SlotSyncCtx->stopSignaled here >>
> SpinLockRelease(&slot->mutex);
>
> << do the slot sync skip check here if (stopSignaled) >>
>
> 3. Can we have a test or steps at least to check the consequences
> manually one can get if slot syncing via SQL function is happening
> during the promotion?
>
> IIUC, we need to ensure there is no backend acquiring it and
> performing sync while the slot sync worker is shutting down/standby
> promotion is occuring. Otherwise, some of the slots can get resynced
> and some are not while we are shutting down the slot sync worker as
> part of the standby promotion which might leave the slots in an
> inconsistent state.

I do not think that we can reach a state (exception is some error
scenario) where some of the slots are synced while the rest are not
during a *particular* sync-cycle only because promotion is going in
parallel. (And yes we need to fix the race-condition stated by Amit
up-thread for this statement to be true.)

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-12 02:31:21 Re: Issue with the PRNG used by Postgres
Previous Message Robert Haas 2024-04-12 02:20:53 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands