Quick Links

Re: Logical replication launcher did not automatically restart when got SIGKILL

From:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	cca5507 <cca5507(at)qq(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject:	Re: Logical replication launcher did not automatically restart when got SIGKILL
Date:	2025-07-24 09:46:08
Message-ID:	CAJpy0uCq9bRCrK8eg6TXryQqNY+h3j61Xf11kXjH_V0rs2266Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jul 24, 2025 at 2:39 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Jul 17, 2025 at 6:58 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507(at)qq(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?
> > >
> >
> > Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?
> >
> > If we look at code prior to commit 28a520c0b77, there we were setting
> > 'rw_crashed_at' in CleanupBackgroundWorker() and then
> > HandleChildCrash() was resetting the pid and exiting with no
> > additional processing.
>
> It seems we don't need to set rw_crashed_at in crash cases,
> since it's always reset to 0 by ResetBackgroundWorkerCrashTimes()
> in restart-after-crash code.

Yes, that seems the case,

> So, the only additional step we need may be
> resetting rw_pid to 0.
>

I agree.

> Instead of modifying CleanupBackend() to do this, another option
> could be to reset rw_pid during restart-after-crash code, for example,
> inside ResetBackgroundWorkerCrashTimes(). Thought?
>

Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.

thanks
Shveta

In response to

Re: Logical replication launcher did not automatically restart when got SIGKILL at 2025-07-24 09:09:01 from Fujii Masao

Responses

Re: Logical replication launcher did not automatically restart when got SIGKILL at 2025-07-25 01:47:18 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Álvaro Herrera	2025-07-24 09:54:10	Re: trivial grammar refactor
Previous Message	Dave Cramer	2025-07-24 09:34:39	Re: More protocol.h replacements this time into walsender.c