From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | cca5507 <cca5507(at)qq(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Logical replication launcher did not automatically restart when got SIGKILL |
Date: | 2025-07-24 09:46:08 |
Message-ID: | CAJpy0uCq9bRCrK8eg6TXryQqNY+h3j61Xf11kXjH_V0rs2266Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 24, 2025 at 2:39 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Jul 17, 2025 at 6:58 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507(at)qq(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?
> > >
> >
> > Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?
> >
> > If we look at code prior to commit 28a520c0b77, there we were setting
> > 'rw_crashed_at' in CleanupBackgroundWorker() and then
> > HandleChildCrash() was resetting the pid and exiting with no
> > additional processing.
>
> It seems we don't need to set rw_crashed_at in crash cases,
> since it's always reset to 0 by ResetBackgroundWorkerCrashTimes()
> in restart-after-crash code.
Yes, that seems the case,
> So, the only additional step we need may be
> resetting rw_pid to 0.
>
I agree.
> Instead of modifying CleanupBackend() to do this, another option
> could be to reset rw_pid during restart-after-crash code, for example,
> inside ResetBackgroundWorkerCrashTimes(). Thought?
>
Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-07-24 09:54:10 | Re: trivial grammar refactor |
Previous Message | Dave Cramer | 2025-07-24 09:34:39 | Re: More protocol.h replacements this time into walsender.c |