Re: strange parallel query behavior after OOM crashes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Neha Khatri <nehakhatri5(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-10 23:17:53
Message-ID: f7a4a45c-a591-8d62-71c6-6fe7ec4dc7cf@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/10/2017 01:39 PM, Kuntal Ghosh wrote:
> On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5(at)gmail(dot)com> wrote:
>>> The problem here seem to be the change in the max_parallel_workers value
>>> while the parallel workers are still under execution. So this poses two
>>> questions:
>>>
>>> 1. From usecase point of view, why could there be a need to tweak the
>>> max_parallel_workers exactly at the time when the parallel workers are at
>>> play.
>>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>>> the parallel workers are at play? At least do not allow setting the
>>> max_parallel_workers less than the current # of active parallel workers.
>>
>> Well, that would be letting the tail wag the dog. The maximum value
>> of max_parallel_workers is only 1024, and what we're really worried
>> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
>> daylight. How about just creating a #define that's used by guc.c as
>> the maximum for the GUC, and here we assert that we're <= that value?
>>
> I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
> asserted that the absolute difference between the two counts <= that
> value, i.e.,
> abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;
>
> Because, it's possible that register count has overflowed but
> terminate count has not yet overflowed. As a result, the difference in
> unsigned integer will be near UINT32_MAX. Hence, we need the absolute
> difference after typecasting the same to integer. This value should be
> less than the max_parallel_workers upper limit.
>
> I've attached both the patches for better accessibility. PFA.
>

At first I was like 'WTF? Why do we need a new GUC just becase of an
assert?' but you're actually not adding a new GUC parameter, you're
adding a constant which is then used as a max value for max for the two
existing parallel GUCs.

I think this is fine.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-04-10 23:19:27 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Tom Lane 2017-04-10 22:23:33 Re: [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...