Quick Links

Re: Fix background workers not restarting with restart_after_crash = on

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andrey Rudometov <unlimitedhikari(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Fix background workers not restarting with restart_after_crash = on
Date:	2025-07-24 09:23:16
Message-ID:	CAHGQGwFm-yPw3W9QQbOnxOWxpA-zz5jh-_bwUAW3yL5TPpD9HA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jun 11, 2025 at 5:26 PM Andrey Rudometov
<unlimitedhikari(at)gmail(dot)com> wrote:
>
> Good day, hackers.
>
> Reading through changes committed in master, I noticed that after
> CleanupBackend/CleanupBackroundworker refactor background workers will fail to
> start again after postgres' restart with restart_after_crash = on.
>
> The reason is CleanupBackend and HandleChildCrash not setting background worker's
> rw_pid to zero anymore, if backend, well, crashed and failed to call shmem_exit
> and mark PMChild slot as inactive via MarkPostmasterChildInactive.
>
> Suggested solution is to finish CleanupBackend's background worker related logic
> even after treating the child process as crashed. In earlier versions zeroing of
> pids happen in HandleChildCrash anyway, so there should be no harm in doing
> the same actions here.
>
> For fast reproduction I used pg_prewarm extension, as it creates observable bgworker
> and is present in postgres tree, so tap test is easy to run.

Thanks for the report and patch! This same issue was also reported in
thread [1], where there's ongoing discussion about how to address it.

Regards,

[1] https://postgr.es/m/tencent_E00A056B3953EE6440F0F40F80EC30427D09@qq.com

--
Fujii Masao

In response to

Fix background workers not restarting with restart_after_crash = on at 2025-06-11 08:26:01 from Andrey Rudometov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dave Cramer	2025-07-24 09:34:39	Re: More protocol.h replacements this time into walsender.c
Previous Message	Fujii Masao	2025-07-24 09:09:01	Re: Logical replication launcher did not automatically restart when got SIGKILL