Re: strange parallel query behavior after OOM crashes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-05 12:42:20
Message-ID: eaf07984-078f-1918-1304-237934b9fe20@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2017 01:09 PM, Kuntal Ghosh wrote:
> On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>>
>>>> The comment says that the counters are allowed to overflow, i.e. after a
>>>> long uptime you might get these values
>>>>
>>>> parallel_register_count = UINT_MAX + 1 = 1
>>>> parallel_terminate_count = UINT_MAX
>>>>
>>>> which is fine, because the C handles the overflow during subtraction and
>>>> so
>>>>
>>>> parallel_register_count - parallel_terminate_count = 1
>>>>
>>>> But the assert is not doing subtraction, it's comparing the values
>>>> directly:
>>>>
>>>> Assert(parallel_register_count >= parallel_terminate_count);
>>>>
>>>> and the (perfectly valid) values trivially violate this comparison.
>>>>
>>> Thanks for the explanation. So, we can't use the above assert
>>> statement. Even the following assert statement will not be helpful:
>>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>>> Because, it'll fail to track the scenario when parallel_register_count
>>> is not overflowed, still less than parallel_terminate_count. :(
>>>
>>
>> Actually, that assert would work, because C does handle overflows on uint
>> values during subtraction just fine. That is,
>>
>> (UINT_MAX+x) - UINT_MAX = x
>>
>> Also, the comment about overflows before BackgroundWorkerArray claims this
>> is the case.
>>
> Agreed on the overflowed case. But, my concern is when an overflow has
> not yet happened:
>
> Suppose,
> uint parallel_register_count = 1; /* Didn't overflow* /
> uint parallel_terminate_count = 2; /* Didn't overflow */
>
> Assert(parallel_register_count - parallel_terminate_count >= 0);
> We want the assert statement to fail here, but I think it won't since
> -1 has a valid representation in unsigned int and it is greater than
> 0, no?

I don't follow. How exactly do you get into this situation, assuming you
actually enforce the (register_count > terminate_count) invariant
consistently? In the modulo arithmetic of course.

But now that I'm thinking about it, the subtraction actually happens in
unsigned ints, so the result will be unsigned int too, i.e. always >=0.
Whether it makes sense depends on the invariant being true.

But I think this would work:

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);

If the difference gets 'negative' and wraps around, it'll be close to
UINT_MAX.

But man, this unsigned int arithmetic makes my brain hurt ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-04-05 12:57:13 Re: Functions Immutable but not parallel safe?
Previous Message Peter Eisentraut 2017-04-05 12:36:44 Re: increasing the default WAL segment size