| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? |
| Date: | 2026-04-01 05:03:43 |
| Message-ID: | CABdArM4XdB=vtQMVBBAowgf2PT7V8Dw56LfsYOAfyuxzcda6Ow@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 31, 2026 at 9:03 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Tue, Mar 31, 2026 at 7:42 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > > One idea would be to prevent the restart altogether. For example,
> > > > ProcessSlotSyncMessage() could set SlotSyncCtx->last_start_time to
> > > > a special value (like -1), and SlotSyncWorkerCanRestart() could return
> > > > false (i.e., prevent postmater from starting up slotsync worker) when
> > > > it sees that. Alternatively, SlotSyncWorkerCanRestart() could simply
> > > > check SlotSyncCtx->stopSignaled.
> > > >
> > > > That said, as far as I remember correctly, postmaster is generally not
> > > > supposed to touch shared memory (per the comments in postmaster.c),
> > > > so I'm not sure this approach is acceptable. On the other hand,
> > > > postmaster and the slotsync worker already rely on SlotSyncCtx->last_start_time,
> > > > so perhaps there's some precedent here.
> > > >
> > > IIUC, checking SlotSyncCtx->stopSignaled in SlotSyncWorkerCanRestart()
> > > may not be ideal, as it requires a spinlock to avoid races with the
> > > startup process and it is disallowed to take lock in postmaster main
> > > loop. Whereas, SlotSyncCtx->last_start_time doesn’t need a lock since
> > > the postmaster accesses it only when the worker is not alive.
> > >
> >
> > I agree.
>
> Could you clarify what issue might arise from checking
> SlotSyncCtx->stopSignaled without holding a spinlock in
> SlotSyncWorkerCanRestart()? Is it actually problematic?
>
We might not see issues in practice since stopSignaled changes only
once (false -> true), so value corruption is unlikely.
But, without a lock or memory barrier, correct value-read is not
guaranteed, e.g., on weakly ordered systems (like ARM64) the
postmaster may still see a stale value. This means the worker could be
restarted again, and the same unwanted log may still appear.
> That said, since the postmaster should generally avoid
> touching shared memory, it doesn't seem like a good idea
> for it to check SlotSyncCtx->stopSignaled. So I'm fine with
> instead lowering the log level for the "worker will not start"
> message to DEBUG1.
>
Okay, thanks. I'll share the updated patch soon.
--
Thanks,
Nisha
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-04-01 05:04:35 | Re: Initial COPY of Logical Replication is too slow |
| Previous Message | Peter Smith | 2026-04-01 04:59:54 | DOC: pg_publication_rel.prrelid says sequences are possible |