Re: strange parallel query behavior after OOM crashes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-05 09:37:40
Message-ID: 24a143e0-c9bb-8f4f-1472-9d6faae4c92e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>>
>>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>>>> wrote:
>>>>>
>>>>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>>>> <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>>>>>>
>>>>>> 2. the server restarts automatically, initialize
>>>>>> BackgroundWorkerData->parallel_register_count and
>>>>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>>>>> After that, it calls ForgetBackgroundWorker and it increments
>>>>>> parallel_terminate_count.
>>>>>
>>>>>
>>>>> Hmm. So this seems like the root of the problem. Presumably those
>>>>> things need to be reset AFTER forgetting any background workers from
>>>>> before the crash.
>>>>>
>>>> IMHO, the fix would be not to increase the terminated parallel worker
>>>> count whenever ForgetBackgroundWorker is called due to a bgworker
>>>> crash. I've attached a patch for the same. PFA.
>>>
>>>
>>> While I'm not opposed to that approach, I don't think this is a good
>>> way to implement it. If you want to pass an explicit flag to
>>> ForgetBackgroundWorker telling it whether or not it should performing
>>> the increment, fine. But with what you've got here, you're
>>> essentially relying on "spooky action at a distance". It would be
>>> easy for future code changes to break this, not realizing that
>>> somebody's got a hard dependency on 0 having a specific meaning.
>>>
>>
>> I'm probably missing something, but I don't quite understand how these
>> values actually survive the crash. I mean, what I observed is OOM followed
>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>> some reason?
> AFAICU, during crash recovery, we wait for all non-syslogger children
> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
> StartupDataBase. While starting the startup process we check if any
> bgworker is scheduled for a restart. If a bgworker is crashed and not
> meant for a restart(parallel worker in our case), we call
> ForgetBackgroundWorker() to discard it.
>

OK, so essentially we reset the counters, getting

parallel_register_count = 0
parallel_terminate_count = 0

and then ForgetBackgroundWworker() runs and increments the
terminate_count, breaking the logic?

And you're using (parallel_register_count > 0) to detect whether we're
still in the init phase or not. Hmmm.

>>
>> In any case, the comment right before BackgroundWorkerArray says this:
>>
>> * These counters can of course overflow, but it's not important here
>> * since the subtraction will still give the right number.
>>
>> which means that this assert
>>
>> + Assert(BackgroundWorkerData->parallel_register_count >=
>> + BackgroundWorkerData->parallel_terminate_count);
>>
>> is outright broken, just like any other attempts to rely on simple
>> comparisons of these two counters, no?
>>
> IIUC, the assert statements ensures that register count should always
> be greater than or equal to the terminate count.
> Whereas, the comment refers to the fact that register count and
> terminate count indicate the total number of parallel workers spawned
> and terminated respectively since the server has been started. At
> every session, we're not resetting the counts, hence they can
> overflow. But, their substraction should give you the number of active
> parallel worker at any instance.
> So, I'm not able to see how the assert is broken w.r.t the aforesaid
> comment. Am I missing something here?
>

The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values

parallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and so

parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values
directly:

Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-04-05 09:41:29 Re: multivariate statistics (v25)
Previous Message Amit Khandekar 2017-04-05 09:22:38 Re: Parallel Append implementation