Re: Preventing hangups in bgworker start/stop during DB shutdown

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Preventing hangups in bgworker start/stop during DB shutdown
Date: 2020-12-23 04:33:14
Message-ID: CAGRY4nxAD8NS6t9ExLKksnqxrgc7fV-wFrJytsPPPq4p3TC-Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 23 Dec 2020 at 05:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Here's an attempt at closing the race condition discussed in [1]
> (and in some earlier threads, though I'm too lazy to find them).
>
> The core problem is that the bgworker management APIs were designed
> without any thought for exception conditions, notably "we're not
> gonna launch any more workers because we're shutting down the database".
> A process waiting for a worker in WaitForBackgroundWorkerStartup or
> WaitForBackgroundWorkerShutdown will wait forever, so that the database
> fails to shut down without manual intervention.
>
> I'd supposed that we would need some incompatible changes in those APIs
> in order to fix this, but after further study it seems like we could
> hack things by just acting as though a request that won't be serviced
> has already run to completion. I'm not terribly satisfied with that
> as a long-term solution --- it seems to me that callers should be told
> that there was a failure. But this looks to be enough to solve the
> lockup condition for existing callers, and it seems like it'd be okay
> to backpatch.
>
> Thoughts?
>

Callers who launch bgworkers already have to cope with conditions such as
the worker failing immediately after launch, or before attaching to the
shmem segment used for worker management by whatever extension is launching
it.

So I think it's reasonable to lie and say we launched it. The caller must
already cope with this case to behave correctly.

Patch specifics:

> This function should only be called from the postmaster

It'd be good to

Assert(IsPostmasterEnvironment && !IsUnderPostmaster)

in these functions.

Otherwise at first read the patch and rationale looks sensible to me.

(When it comes to the bgw APIs in general I have a laundry list of things
I'd like to change or improve around signal handling, error trapping and
recovery, and lots more, but that's for another thread.)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-23 05:07:49 Cleanup some -I$(libpq_srcdir) in makefiles
Previous Message tsunakawa.takay@fujitsu.com 2020-12-23 04:22:19 RE: [Patch] Optimize dropping of relation buffers using dlist