Re: strange parallel query behavior after OOM crashes

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-03-30 21:29:55
Message-ID: CAGz5QC+G=BdPZbuKq+GHZDFtiQyhHr1iBGFaUQKd3uZX0FNe9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
<kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>
> 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
> any parallel query.
> In LaunchParallelWorkers, you can see
> nworkers = n nworkers_launched = n (n>0)
> But, all the workers will crash because of the assert statement.
> 2. the server restarts automatically, initialize
> BackgroundWorkerData->parallel_register_count and
> BackgroundWorkerData->parallel_terminate_count in the shared memory.
> After that, it calls ForgetBackgroundWorker and it increments
> parallel_terminate_count. In LaunchParallelWorkers, we have the
> following condition:
> if ((BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count) >=
> max_parallel_workers)
> DO NOT launch any parallel worker.
> Hence, nworkers = n nworkers_launched = 0.
parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix-workers-launched.patch binary/octet-stream 599 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-03-30 21:44:22 Re: Logical replication existing data copy
Previous Message Anastasia Lubennikova 2017-03-30 21:22:27 Re: WIP: Covering + unique indexes.