Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Date: 2026-03-30 04:18:19
Message-ID: CABdArM5rrhSmFvVL4C5LL0iea-R0HRtB=ZvD=ereoTDa1Tm=NA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2026 at 10:49 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Mar 27, 2026 at 9:38 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > Attached the updated patch.
>
> Thanks for updating the patch! It looks good overall.
>

Thank you Fujii-san for the review.

> Regarding the comments in SlotSyncCtxStruct, since the role of
> stopSignaled field has changed, those comments should be updated
> accordingly? For example,
>
> -------------------------
> - * the SQL function pg_sync_replication_slots(). When the startup process sets
> - * 'stopSignaled' during promotion, it uses this 'pid' to wake up the currently
> - * synchronizing process so that the process can immediately stop its
> - * synchronizing work on seeing 'stopSignaled' set.
> - * Setting 'stopSignaled' is also used to handle the race condition when the
> + * the SQL function pg_sync_replication_slots(). On promotion,
> + * the startup process sets 'stopSignaled' and uses this 'pid' to wake up
> + * the currently synchronizing process so that the process can
> + * immediately stop its synchronizing work.
> + * Setting 'stopSignaled' is used to handle the race condition when the
> -------------------------
>

Updated as suggested.

>
> +/*
> + * Interrupt flag set when PROCSIG_SLOTSYNC_MESSAGE is received, asking the
> + * slotsync worker or pg_sync_replication_slots() to stop because
> + * standby promotion has been triggered.
> + */
> +volatile sig_atomic_t SlotSyncShutdown = false;
>
> For the interrupt flag set in procsignal_sigusr1_handler(), other flags
> use a *Pending suffix (e.g., ProcSignalBarrierPending,
> ParallelApplyMessagePending), so SlotSyncShutdownPending would
> be more consistent.
>
>
> +void
> +HandleSlotSyncMessage(void)
>
> Functions called from ProcessInterrupts() typically use the Process* prefix
> (e.g., ProcessProcSignalBarrier(), ProcessParallelApplyMessages()),
> so ProcessSlotSyncMessage would be more consistent than HandleSlotSyncMessage.
>

Agree, fixed.

>
> + ereport(LOG,
> + errmsg("replication slot synchronization worker will stop because
> promotion is triggered"));
> +
> + proc_exit(0);
> + }
> + else
> + {
> + /*
> + * For the backend executing SQL function
> + * pg_sync_replication_slots().
> + */
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot synchronization will stop because promotion
> is triggered"));
>
> The log messages say "will stop", but since sync hasn't started yet,
> "will not start" seems clearer here. For example, "replication slot
> synchronization worker will not start because promotion was triggered"
> and "replication slot synchronization will not start because promotion was
> triggered". Thought?
>

We were using the same log message in two places:
check_and_set_sync_info() and HandleSlotSyncMessage().
I think “will not start” fits better in the first case, while “will
stop” makes sense to keep in the second.

--
Thanks,
Nisha

Attachment Content-Type Size
v5-0001-Prevent-slotsync-worker-API-hang-during-standby-p.patch application/octet-stream 10.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-03-30 04:20:54 Re: Avoid multiple SetLatch() calls in procsignal_sigusr1_handler()
Previous Message Dilip Kumar 2026-03-30 04:11:33 Re: Skipping schema changes in publication