Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Date: 2026-04-01 11:11:35
Message-ID: CABdArM7OsLhyactNNUZ+Wygc_ybv4hhgje9c+h+=PHm3QdS4iQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2026 at 3:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Wed, Apr 1, 2026 at 2:34 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 31, 2026 at 3:56 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Tuesday, March 31, 2026 2:02 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Please find the updated patch (v6) attached.
> > >
> > > Thanks for updating the patch. One minor comment:
> > >
> > > I think we could avoid interrupting and reporting an ERROR when
> > > IsSyncingReplicationSlots() returns false to avoid reporting ERROR unnecessarily
> > > when the slotsync has already finished.
> > >
> >
> > Thanks for the review. Fixed above in v7.
>
> Thanks for updating the patch! It looks good to me, with just a few minor points
> . If those are addressed, I'd like to push it.
>
> + * a new worker (or a new API call) that starts after the old worker was
>
> "API" feels a bit vague. It might be clearer to explicitly say
> "pg_sync_replication_slots()".
>

Done.

> + PROCSIG_SLOTSYNC_MESSAGE, /* ask slotsync worker/API to stop */
>
> "API" here also feels a bit vague. So I'd like to use "ask slot synchronization
> to stop" as the comment, instead.

Done.

> + * We cannot rely solely on 'stopSignaled' here because:
> + * 1) It resides in shared memory and is visible to all processes, so checking
> + * it directly in ProcessInterrupts() would require additional checks to
> + * ensure only the synchronizing process acts on it.
> + * 2) It has different lifetime semantics and cannot be reset after handling,
> + * as it also guards against postmaster and promotion race conditions.
> + * 3) Accessing it requires acquiring a spinlock, which can be too expensive
> + * or undesirable for every ProcessInterrupts() call.
>
> Now that PROCSIG_SLOTSYNC_MESSAGE is in place, using SlotSyncShutdownPending
> is intuitive. So it seems more useful to explain why stopSignaled is still
> needed rather than why SlotSyncShutdownPending is used (i.e., why stopSignaled
> is not sufficient). Since that rationale is already covered in the SlotSyncCtx
> comments, I'd suggest removing this comment block.

Okay, done.

>
> As for backpatching, this looks like it should go back to v17, where slotsync
> was introduced. Thought?

Right, the issue exists in v17 as well.

Attached the updated patch.

--
Thanks,
Nisha

Attachment Content-Type Size
v8-0001-Prevent-slotsync-worker-API-hang-during-standby-p.patch application/octet-stream 11.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2026-04-01 11:15:40 Re: scale parallel_tuple_cost by tuple width
Previous Message Matthias van de Meent 2026-04-01 10:48:08 Online PostgreSQL version() updates