Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-03-18 15:44:56
Message-ID: CA+TgmobCszi7MH9R5OO=HfNMOmrYy8ZErdeJA4FbCWMyesekhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2015 at 2:22 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Can you try this:
>>
>> diff --git a/src/backend/postmaster/bgworker.c
>> b/src/backend/postmaster/bgworker.c
>> index f80141a..39b919f 100644
>> --- a/src/backend/postmaster/bgworker.c
>> +++ b/src/backend/postmaster/bgworker.c
>> @@ -244,6 +244,8 @@ BackgroundWorkerStateChange(void)
>> rw->rw_terminate = true;
>> if (rw->rw_pid != 0)
>> kill(rw->rw_pid, SIGTERM);
>> + else
>> + ReportBackgroundWorkerPID(rw);
>> }
>> continue;
>> }
>>
>
> It didn't fix the problem. IIUC, you have done this to ensure that
> if worker is not already started, then update it's pid, so that we
> can get the required status in WaitForBackgroundWorkerShutdown().
> As this is a timing issue, it can so happen that before Postmaster
> gets a chance to report the pid, backend has already started waiting
> on WaitLatch().

I think I figured out the problem. That fix only helps in the case
where the postmaster noticed the new registration previously but
didn't start the worker, and then later notices the termination.
What's much more likely to happen is that the worker is started and
terminated so quickly that both happen before we create a
RegisteredBgWorker for it. The attached patch fixes that case, too.

Assuming this actually fixes the problem, I think we should back-patch
it into 9.4. To recap, the problem is that, at present, if you
register a worker and then terminate it before it's launched,
GetBackgroundWorkerPid() will still return BGWH_NOT_YET_STARTED, which
it makes it seem like we're still waiting for it to start. But when
or if the slot is reused for an unrelated registration, then
GetBackgroundWorkerPid() will switch to returning BGWH_STOPPED. It's
hard to believe that's the behavior anyone wants. With this patch,
the return value will always be BGWH_STOPPED in this situation. That
has the virtue of being consistent, and practically speaking I think
it's the behavior that everyone will want, because the case where this
matters is when you are waiting for workers to start or waiting for
worker to stop, and in either case you will want to treat a worker
that was marked for termination before the postmaster actually started
it as already-stopped rather than not-yet-started.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
stop-notify-fix-v2.patch binary/octet-stream 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2015-03-18 15:48:48 pg9.4 relpages of child tables
Previous Message Thom Brown 2015-03-18 14:06:47 Re: GSoC 2015 - mentors, students and admins.