Re: Missed check for too-many-children in bgworker spawning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Missed check for too-many-children in bgworker spawning
Date: 2019-10-07 19:55:58
Message-ID: CA+TgmoZ4ha+jv7o_NzvzecXLkZpxPEbswVB=wZAp9rLmBnqG6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 6, 2019 at 1:17 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Over in [1] we have a report of a postmaster shutdown that seems to
> have occurred because some client logic was overaggressively spawning
> connection requests, causing the postmaster's child-process arrays to
> be temporarily full, and then some parallel query tried to launch a
> new bgworker process. The postmaster's bgworker-spawning logic lacks
> any check for the arrays being full, so when AssignPostmasterChildSlot
> failed to find a free slot, kaboom!
>
> The attached proposed patch fixes this by making bgworker spawning
> include a canAcceptConnections() test. That's perhaps overkill, since
> we really just need to check the CountChildren() total; but I judged
> that going through that function and having it decide what to test or
> not test was a better design than duplicating the CountChildren() test
> elsewhere.
>
> I'd first imagined also replacing the one-size-fits-all check
>
> if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
> result = CAC_TOOMANY;
>
> with something like
>
> switch (backend_type)
> {
> case BACKEND_TYPE_NORMAL:
> if (CountChildren(backend_type) >= 2 * MaxConnections)
> result = CAC_TOOMANY;
> break;
> case BACKEND_TYPE_AUTOVAC:
> if (CountChildren(backend_type) >= 2 * autovacuum_max_workers)
> result = CAC_TOOMANY;
> break;
> ...
> }
>
> so as to subdivide the pool of child-process slots and prevent client
> requests from consuming slots meant for background processes. But on
> closer examination that's not really worth the trouble, because this
> pool is already considerably bigger than MaxBackends; so even if we
> prevented a failure here we could still have bgworker startup failure
> later on when it tries to acquire a PGPROC.
>
> Barring objections, I'll apply and back-patch this soon.

I think it used to work this way -- not sure if it was ever committed
this way, but it at least did during development -- and we ripped it
out because somebody (Magnus?) pointed out that if you got close to
the connection limit, you could see parallel queries start failing,
and that would suck. Falling back to non-parallel seems more OK in
that situation than actually failing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-10-07 19:56:20 Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)
Previous Message Bruce Momjian 2019-10-07 19:52:41 Re: PATCH: Add uri percent-encoding for binary data