Re: strange parallel query behavior after OOM crashes

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Neha Khatri <nehakhatri5(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-10 11:39:50
Message-ID: CAGz5QCKg1S7SDdv6tjAdvCeWre-EeJw+=ytDLprXzrH2iVaddw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5(at)gmail(dot)com> wrote:
>> The problem here seem to be the change in the max_parallel_workers value
>> while the parallel workers are still under execution. So this poses two
>> questions:
>>
>> 1. From usecase point of view, why could there be a need to tweak the
>> max_parallel_workers exactly at the time when the parallel workers are at
>> play.
>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>> the parallel workers are at play? At least do not allow setting the
>> max_parallel_workers less than the current # of active parallel workers.
>
> Well, that would be letting the tail wag the dog. The maximum value
> of max_parallel_workers is only 1024, and what we're really worried
> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
> daylight. How about just creating a #define that's used by guc.c as
> the maximum for the GUC, and here we assert that we're <= that value?
>
I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;

Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.

I've attached both the patches for better accessibility. PFA.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Fix-parallel-worker-counts-after-a-crash.patch binary/octet-stream 3.8 KB
0002-Add-GUC-for-max_parallel_workers-upper-limit.patch binary/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Borodin 2017-04-10 11:53:17 Merge join for GiST
Previous Message Tom Lane 2017-04-10 11:35:02 Re: src/interfaces/libpq shipping nmake-related Makefiles