Re: strange parallel query behavior after OOM crashes

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-05 11:09:40
Message-ID: CAGz5QCK7TjMukGnu4v9cUd1huJOrZ0G1zc7vbgxKzshfP65x4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>> The comment says that the counters are allowed to overflow, i.e. after a
>>> long uptime you might get these values
>>>
>>> parallel_register_count = UINT_MAX + 1 = 1
>>> parallel_terminate_count = UINT_MAX
>>>
>>> which is fine, because the C handles the overflow during subtraction and
>>> so
>>>
>>> parallel_register_count - parallel_terminate_count = 1
>>>
>>> But the assert is not doing subtraction, it's comparing the values
>>> directly:
>>>
>>> Assert(parallel_register_count >= parallel_terminate_count);
>>>
>>> and the (perfectly valid) values trivially violate this comparison.
>>>
>> Thanks for the explanation. So, we can't use the above assert
>> statement. Even the following assert statement will not be helpful:
>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>> Because, it'll fail to track the scenario when parallel_register_count
>> is not overflowed, still less than parallel_terminate_count. :(
>>
>
> Actually, that assert would work, because C does handle overflows on uint
> values during subtraction just fine. That is,
>
> (UINT_MAX+x) - UINT_MAX = x
>
> Also, the comment about overflows before BackgroundWorkerArray claims this
> is the case.
>
Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:

Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */

Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-04-05 11:12:24 Re: increasing the default WAL segment size
Previous Message Tomas Vondra 2017-04-05 10:43:05 Re: strange parallel query behavior after OOM crashes