Re: Unportable implementation of background worker start

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unportable implementation of background worker start
Date: 2017-04-21 15:19:41
Message-ID: 20170421151941.45njrtwykn5dd476@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable. But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
>
> PG_SETMASK(&UnBlockSig);
>
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them. Then control
> returns and proceeds to
>
> selres = select(nSockets, &rmask, NULL, NULL, &timeout);
>
> But now there's no interrupt pending. So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses. They could be waiting awhile.
>
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting. It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently. I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem. However, as bgworkers start
to be used more, I think we should definitely have some protection. In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time. (I haven't actually verified this, though. My
experiments were with the early kind, static bgworkers.)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-21 16:50:04 Re: Unportable implementation of background worker start
Previous Message Tom Lane 2017-04-21 15:09:16 Re: Unportable implementation of background worker start