Re: Logical replication launcher did not automatically restart when got SIGKILL

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: cca5507 <cca5507(at)qq(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication launcher did not automatically restart when got SIGKILL
Date: 2025-07-24 09:09:01
Message-ID: CAHGQGwGV77nbVfFFYeK96qf+9u8Yw9ddXhN5hcuvuHoSqq110A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 17, 2025 at 6:58 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507(at)qq(dot)com> wrote:
> >
> > Hi,
> >
> > The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?
> >
>
> Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?
>
> If we look at code prior to commit 28a520c0b77, there we were setting
> 'rw_crashed_at' in CleanupBackgroundWorker() and then
> HandleChildCrash() was resetting the pid and exiting with no
> additional processing.

It seems we don't need to set rw_crashed_at in crash cases,
since it's always reset to 0 by ResetBackgroundWorkerCrashTimes()
in restart-after-crash code. So, the only additional step we need may be
resetting rw_pid to 0.

Instead of modifying CleanupBackend() to do this, another option
could be to reset rw_pid during restart-after-crash code, for example,
inside ResetBackgroundWorkerCrashTimes(). Thought?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-07-24 09:23:16 Re: Fix background workers not restarting with restart_after_crash = on
Previous Message Jakub Wartak 2025-07-24 08:01:08 Re: Draft for basic NUMA observability