Re: strange parallel query behavior after OOM crashes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange parallel query behavior after OOM crashes
Date: 2017-04-04 16:52:10
Message-ID: CA+TgmoZ6P_DmvvL7coHpXgUB7pSw7Yg=tVO=ALeZfSTtsEEHTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>> <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>>> 2. the server restarts automatically, initialize
>>> BackgroundWorkerData->parallel_register_count and
>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>> After that, it calls ForgetBackgroundWorker and it increments
>>> parallel_terminate_count.
>>
>> Hmm. So this seems like the root of the problem. Presumably those
>> things need to be reset AFTER forgetting any background workers from
>> before the crash.
>>
> IMHO, the fix would be not to increase the terminated parallel worker
> count whenever ForgetBackgroundWorker is called due to a bgworker
> crash. I've attached a patch for the same. PFA.

While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.

BTW, if this isn't on the open items list, it should be. It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Palmiotto 2017-04-04 16:55:11 Re: partitioned tables and contrib/sepgsql
Previous Message Peter Eisentraut 2017-04-04 16:52:01 Re: sequence data type