Re: Unportable implementation of background worker start

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unportable implementation of background worker start
Date: 2017-04-21 15:09:16
Message-ID: 13882.1492787356@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
>> My first reaction was that that sounded like a lot more work than removing
>> two lines from maybe_start_bgworker and adjusting some comments. But on
>> closer inspection, the slow-bgworker-start issue isn't the only problem
>> here.

> FWIW, I vaguely remember somewhat related issues on x86/linux too.

After sleeping and thinking more, I've realized that the
slow-bgworker-start issue actually exists on *every* platform, it's just
harder to hit when select() is interruptable. But consider the case
where multiple bgworker-start requests arrive while ServerLoop is
actively executing (perhaps because a connection request just came in).
The postmaster has signals blocked, so nothing happens for the moment.
When we go around the loop and reach

PG_SETMASK(&UnBlockSig);

the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
bgworker start requests, and services just one of them. Then control
returns and proceeds to

selres = select(nSockets, &rmask, NULL, NULL, &timeout);

But now there's no interrupt pending. So the remaining start requests
do not get serviced until (a) some other postmaster interrupt arrives,
or (b) the one-minute timeout elapses. They could be waiting awhile.

Bottom line is that any request for more than one bgworker at a time
faces a non-negligible risk of suffering serious latency.

I'm coming back to the idea that at least in the back branches, the
thing to do is allow maybe_start_bgworker to start multiple workers.
Is there any actual evidence for the claim that that might have
bad side effects?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-04-21 15:19:41 Re: Unportable implementation of background worker start
Previous Message Tom Lane 2017-04-21 14:45:19 Re: Old versions of Test::More