Quick Links

Re: Logical replication launcher did not automatically restart when got SIGKILL

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc:	cca5507 <cca5507(at)qq(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication launcher did not automatically restart when got SIGKILL
Date:	2025-07-25 09:44:35
Message-ID:	CAHGQGwHF-PdUOgiXCH_8K5qBm8b13h0Qt=dSoFXZybXQdbf-tw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jul 25, 2025 at 5:25 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Fri, Jul 25, 2025 at 7:17 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > Sounds reasonable.
> > > Thinking out loud, when cleaning up after a backend or background
> > > worker crash, process_pm_child_exit() is invoked, which subsequently
> > > calls both CleanupBackend() and HandleChildCrash(). After the cleanup
> > > completes, process_pm_child_exit() calls PostmasterStateMachine() to
> > > move to the next state. As part of that, PostmasterStateMachine()
> > > invokes ResetBackgroundWorkerCrashTimes() (only in crash
> > > scenarios/FatalError), to reset a few things. Since it also resets
> > > rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
> > > well there.
> >
> > Thanks!
> > Attached is a patch that fixes the issue by resetting rw_pid in
> > ResetBackgroundWorkerCrashTimes().
> >
>
> The patch LGTM.

Thanks to both ChangAo and Shveta for the review!
I've pushed the patch and back-patched it to v18.

> > We should probably add a regression test for this case,
> > but I'd prefer to commit the fix first and work on the test separately.
> > Andrey Rudometov proposed a test patch in thread [1],
> > which we might use as a starting point.
>
> Sounds good.

This proposed patch adds a new regression test file to verify background
worker restarts when restart_after_crash is enabled. However, since
we already have 013_crash_restart.pl for testing that scenario,
I’m thinking it might be sufficient to check that the logical replication
launcher, i.e., the background worker, restarts properly there.

For example, we could add a check like the following to 013_crash_restart.pl.
Thoughts?

-----------------
is($node->safe_psql('postgres',
"SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type =
'logical replication launcher'"),
't',
'logical replication launcher is running after crash');
-----------------

Regards,

--
Fujii Masao

In response to

Re: Logical replication launcher did not automatically restart when got SIGKILL at 2025-07-25 08:25:03 from shveta malik

Responses

Re: Logical replication launcher did not automatically restart when got SIGKILL at 2025-07-25 13:53:00 from cca5507

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Etsuro Fujita	2025-07-25 09:52:22	Re: Document transition table triggers are not allowed on views/foreign tables
Previous Message	Dave Cramer	2025-07-25 09:39:40	Re: More protocol.h replacements this time into walsender.c