| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? |
| Date: | 2026-04-01 05:33:27 |
| Message-ID: | CABdArM5a=PHOjK8esNdKaax1fq36F20Rx68PNvtP=B8uj_b5gg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 31, 2026 at 4:12 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Mar 31, 2026 at 11:35 AM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 30, 2026 at 4:39 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Mar 30, 2026 at 1:18 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > > > We were using the same log message in two places:
> > > > check_and_set_sync_info() and HandleSlotSyncMessage().
> > > > I think “will not start” fits better in the first case, while “will
> > > > stop” makes sense to keep in the second.
> > >
> > > Thanks for updating the patch!
> > >
> > > With the patch, in my testing, standby promotion always produces
> > > the following logs:
> > >
> > > LOG: replication slot synchronization worker will stop because
> > > promotion is triggered
> > > LOG: replication slot synchronization worker will not start
> > > because promotion was triggered
> > >
> > > It looks like the postmaster immediately restarts the slotsync worker after
> > > promotion terminates it, and that new worker then exits on seeing
> > > SlotSyncCtx->stopSignaled.
> > >
> > > IMO, always emitting both messages is a bit confusing. It would be nice to
> > > suppress the second one if possible.
> > >
> > > One idea would be to prevent the restart altogether. For example,
> > > ProcessSlotSyncMessage() could set SlotSyncCtx->last_start_time to
> > > a special value (like -1), and SlotSyncWorkerCanRestart() could return
> > > false (i.e., prevent postmater from starting up slotsync worker) when
> > > it sees that. Alternatively, SlotSyncWorkerCanRestart() could simply
> > > check SlotSyncCtx->stopSignaled.
> > >
> > > That said, as far as I remember correctly, postmaster is generally not
> > > supposed to touch shared memory (per the comments in postmaster.c),
> > > so I'm not sure this approach is acceptable. On the other hand,
> > > postmaster and the slotsync worker already rely on SlotSyncCtx->last_start_time,
> > > so perhaps there's some precedent here.
> > >
> > IIUC, checking SlotSyncCtx->stopSignaled in SlotSyncWorkerCanRestart()
> > may not be ideal, as it requires a spinlock to avoid races with the
> > startup process and it is disallowed to take lock in postmaster main
> > loop. Whereas, SlotSyncCtx->last_start_time doesn’t need a lock since
> > the postmaster accesses it only when the worker is not alive.
> >
>
> I agree.
>
> > Another option could be to log in check_and_set_sync_info() at DEBUG1
> > instead of LOG level. This message appears only after stopSignaled is
> > set, when promotion is already in progress and the first worker has
> > logged “will stop…”. The second worker doesn’t do any real work. Since
> > there’s nothing actionable for users, using DEBUG1 would keep it
> > useful for debugging (e.g., noticing immediate restarts) while
> > avoiding extra log noise. Thoughts?
> >
>
> +1.
>
> Do you think we can slightly tweak the comment in atop file to:
>
> On promotion the startup process sets 'stopSignaled' and uses this
> 'pid' to signal synchronizing process with PROCSIG_SLOTSYNC_MESSAGE
> and also to wake it up so that the process can immediately stop its
> synchronizing work. Setting 'stopSignaled' on the other hand is used
> to handle the race condition....
>
Done.
> Also shall we quick exit ProcessSlotSyncMessage() if syncing is
> already finished by API?
>
Make sense. Fixed.
Please find the updated patch (v7) attached.
--
Thanks,
Nisha
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0001-Prevent-slotsync-worker-API-hang-during-standby-p.patch | application/octet-stream | 12.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nisha Moond | 2026-04-01 05:33:51 | Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? |
| Previous Message | Bryan Green | 2026-04-01 05:24:06 | [PATCH] BUG: Stale LC_MESSAGES translations after SET on windows. |