Re: Instability in select_parallel regression test

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instability in select_parallel regression test
Date: 2017-02-18 17:19:25
Message-ID: CA+Tgmob94oaf2Z32wZLpk_zt3VXWeNkXZb5nKtOd-j9wfkYwOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 17, 2017 at 9:57 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> That seems like a seriously broken design to me, first because it can make
>> for a significant delay in the slots becoming available (which is what's
>> evidently causing these regression failures), and second because it's
>> simply bad design to load extra responsibilities onto the postmaster.
>> Especially ones that involve touching shared memory.

It's a little surprising to me that the delay we're seeing here is
significant, because the death of a child should cause an immediate
SIGCHLD, resulting in a call to reaper(), resulting in a call to
waitpid(). Why's that not working?

>> I think this needs to be changed, and promptly. Why in the world don't
>> you simply have the workers clearing their slots when they exit?
>
> There seem to be many reasons why exit of background workers is done
> by postmaster like when they have to restart after a crash or if
> someone terminates them (TerminateBackgroundWorker()) or if the
> notification has to be sent at exit (bgw_notify_pid). Moreover, there
> can be background workers without shared memory access as well which
> can't clear state from shared memory. Another point to think is that
> background workers contain user-supplied code, so not sure how good it
> is to give them control of tinkering with shared data structures of
> the backend. Now, one can argue that for some of the cases where it
> is possible background worker should cleanup shared memory state at
> the exit, but I think it is not directly related to the parallel
> query. As far as the parallel query is concerned, it just needs to
> wait for workers exit to ensure that no more operations can be
> performed in workers after that point so that it can accumulate stats
> and retrieve some fixed parallel state information.

That's a pretty good list of reasons with which I agree. To
underscore one of Amit's points, bgw_notify_pid requires SIGUSR1 to be
sent to the process that registered the worker if that process is
still running, both when the process is started and when it
terminates. The workers have no way of keeping track of whether the
process that did the registration is still running, and with bad luck
might SIGUSR1 an unrelated process (possibly killing it). And those
SIGUSR1s are absolutely necessary, because otherwise a process that is
waiting for a worker to start or stop would have to poll, which would
make the whole system more resource-intensive and a whole lot less
responsive.

There are a few more reasons Amit didn't mention:

1. Workers can fail during startup before they can possibly have done
enough work to be able to mark their slots as being free. On Linux,
fork() can fail; on Windows, you can fail either to start a new
process or to have it reattach to the shared memory segment. You
could try to have the postmaster catch just those cases where the
worker doesn't get far enough and have the worker do the others, but
that's almost bound to make things more complex and fragile. I can't
see us coming out ahead there.

2. Worker slots don't get freed when the process terminates; they get
freed when the background worker is not running and will never again
be restarted. So there's a subtle lifespan difference here: a
background worker slot is consumed before any process is started, and
persists across possibly many restarts of that process, and is finally
freed when the process is both not currently running and not ever
going to be restarted. It would no doubt be a bit fragile if the
worker tried to guess whether or not the postmaster was going to
restart it and free the slot only if not -- what happens if the
postmaster comes to a different conclusion?

>>> I think what we need to do
>>> here is to move the test that needs workers to execute before other
>>> parallel query tests where there is no such requirement.
>>
>> That's not fixing the problem, it's merely averting your eyes from
>> the symptom.
>>
> I am not sure why you think so. Parallel query is capable of running
> without workers even if the number of planned workers are not
> available and there are many reasons for same. In general, I think
> the tests should not rely on the availability of background workers
> and if there is a test like that then it should be the responsibility
> of the test to ensure the same.

I agree with that, too. One thing that I'm a little concerned about,
though, is that users could also see the kind of behavior Tom seems to
be describing here and might find it surprising. For example, suppose
your queries all use 4 workers and you have 4 workers configured. If
you're the only user on the system and you run query after query after
query, you expect that all of those queries are going to get all 4
workers. If the postmaster is so unresponsive that you don't get all
of them, or worse you don't get any, you might be tempted to swear at
the stupid PostgreSQL developer who engineered this system. (Hi,
swearing people!) I'd like to understand whether this is something
that happens once-in-a-blue-moon on CLOBBER_CACHE_RECURSIVELY critters
or whether it's a pretty regular thing. If it's a regular thing,
that's kinda disappointing, because it implies that the operating
system feels free to take a little holiday before delivering SIGCHLD,
or something like that.

I guess one possible solution to this problem might be to change
ExecParallelFinish's call to WaitForParallelWorkersToFinish() to
instead call WaitForParallelWorkersToExit() -- and maybe just removing
the former, since at that point it would have no remaining callers and
no real use, if we believe that waiting for the slot to become free is
the right behavior. The downside of this is that it increases the lag
to shut down a parallel query over what's basically a corner case.
Yeah, there might be some cases where not being able to reuse the
worker slots you just freed hurts, but is it worth making all of the
queries that don't care about that a little slower to cater to the
cases where you do care? I'm not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-18 17:35:15 Re: new gcc 7.0.1 warnings
Previous Message Dilip Kumar 2017-02-18 17:15:40 Re: Parallel bitmap heap scan