Re: Instability in select_parallel regression test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instability in select_parallel regression test
Date: 2017-02-17 15:45:35
Message-ID: 19120.1487346335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> On Fri, Feb 17, 2017 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In short, it looks to me like ExecShutdownGatherWorkers doesn't actually
>> wait for parallel workers to finish (as its comment suggests is
>> necessary), so that on not-too-speedy machines the worker slots may all
>> still be in use when the next command wants some.

> ExecShutdownGatherWorkers() do wait for workers to exit/finish, but it
> doesn't wait for the postmaster to free the used slots and that is how
> that API is supposed to work. There is good chance that on slow
> machines the slots get freed up much later by postmaster after the
> workers have exited.

That seems like a seriously broken design to me, first because it can make
for a significant delay in the slots becoming available (which is what's
evidently causing these regression failures), and second because it's
simply bad design to load extra responsibilities onto the postmaster.
Especially ones that involve touching shared memory.

I think this needs to be changed, and promptly. Why in the world don't
you simply have the workers clearing their slots when they exit?
We don't have an expectation that regular backends are incompetent to
clean up after themselves. (Obviously, a crash exit is a different
case.)

> I think what we need to do
> here is to move the test that needs workers to execute before other
> parallel query tests where there is no such requirement.

That's not fixing the problem, it's merely averting your eyes from
the symptom.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-17 15:53:56 Re: Sum aggregate calculation for single precsion real
Previous Message Keith Fiske 2017-02-17 15:39:05 Re: Index corruption with CREATE INDEX CONCURRENTLY