| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? |
| Date: | 2026-04-01 11:11:35 |
| Message-ID: | CABdArM7OsLhyactNNUZ+Wygc_ybv4hhgje9c+h+=PHm3QdS4iQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 1, 2026 at 3:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Wed, Apr 1, 2026 at 2:34 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 31, 2026 at 3:56 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Tuesday, March 31, 2026 2:02 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Please find the updated patch (v6) attached.
> > >
> > > Thanks for updating the patch. One minor comment:
> > >
> > > I think we could avoid interrupting and reporting an ERROR when
> > > IsSyncingReplicationSlots() returns false to avoid reporting ERROR unnecessarily
> > > when the slotsync has already finished.
> > >
> >
> > Thanks for the review. Fixed above in v7.
>
> Thanks for updating the patch! It looks good to me, with just a few minor points
> . If those are addressed, I'd like to push it.
>
> + * a new worker (or a new API call) that starts after the old worker was
>
> "API" feels a bit vague. It might be clearer to explicitly say
> "pg_sync_replication_slots()".
>
Done.
> + PROCSIG_SLOTSYNC_MESSAGE, /* ask slotsync worker/API to stop */
>
> "API" here also feels a bit vague. So I'd like to use "ask slot synchronization
> to stop" as the comment, instead.
Done.
> + * We cannot rely solely on 'stopSignaled' here because:
> + * 1) It resides in shared memory and is visible to all processes, so checking
> + * it directly in ProcessInterrupts() would require additional checks to
> + * ensure only the synchronizing process acts on it.
> + * 2) It has different lifetime semantics and cannot be reset after handling,
> + * as it also guards against postmaster and promotion race conditions.
> + * 3) Accessing it requires acquiring a spinlock, which can be too expensive
> + * or undesirable for every ProcessInterrupts() call.
>
> Now that PROCSIG_SLOTSYNC_MESSAGE is in place, using SlotSyncShutdownPending
> is intuitive. So it seems more useful to explain why stopSignaled is still
> needed rather than why SlotSyncShutdownPending is used (i.e., why stopSignaled
> is not sufficient). Since that rationale is already covered in the SlotSyncCtx
> comments, I'd suggest removing this comment block.
Okay, done.
>
> As for backpatching, this looks like it should go back to v17, where slotsync
> was introduced. Thought?
Right, the issue exists in v17 as well.
Attached the updated patch.
--
Thanks,
Nisha
| Attachment | Content-Type | Size |
|---|---|---|
| v8-0001-Prevent-slotsync-worker-API-hang-during-standby-p.patch | application/octet-stream | 11.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2026-04-01 11:15:40 | Re: scale parallel_tuple_cost by tuple width |
| Previous Message | Matthias van de Meent | 2026-04-01 10:48:08 | Online PostgreSQL version() updates |