Re: Missed check for too-many-children in bgworker spawning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Missed check for too-many-children in bgworker spawning
Date: 2019-11-04 19:58:20
Message-ID: CA+TgmoZ-Ph2gAYKcDHDUUwCF=gpPw=HabVqh8zjVQ_DppuRStA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 4, 2019 at 2:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Is that really true? In the case where it started and failed we except
> the error queue to have been attached to, and there to be either an
> error 'E' or a 'X' response (cf HandleParallelMessage()). It doesn't
> strike me as very complicated to keep track of whether any worker has
> sent an 'E' or not, no? I don't think we really need the

One of us is confused here, because I don't think that helps. Consider
three background workers Alice, Bob, and Charlie. Alice fails to
launch because fork() fails. Bob launches but then exits unexpectedly.
Charlie has no difficulties and carries out his assigned duties.

Now, the system you are proposing will say that Charlie is OK but
Alice and Bob are a problem. However, that's the way it already works.
What Tom wants is to distinguish Alice from Bob, and your proposal is
of no help at all with that problem, so far as I can see.

> > We certainly can't ignore a worker that managed to start and
> > then bombed out, because it might've already, for example, claimed a
> > block from a Parallel Seq Scan and not yet sent back the corresponding
> > tuples. We could ignore a worker that never started at all, due to
> > EAGAIN or whatever else, but the original process that registered the
> > worker has no way of finding this out.
>
> Sure, but in that case we'd have gotten either an error back from the
> worker, or postmaster wouldhave PANIC restarted everyone due to an
> unhandled error in the worker, no?

An unhandled ERROR in the worker is not a PANIC. I think it's just an
ERROR that ends up being fatal in effect, but even if it's actually
promoted to FATAL, it's not a PANIC.

It is *generally* true that if a worker hits an ERROR, the error will
be propagated back to the leader, but it is not an invariable rule.
One pretty common way that it fails to happen - common in the sense
that it comes up during development, not common on production systems
I hope - is if a worker dies before reaching the call to
pq_redirect_to_shm_mq(). Before that, there's no possibility of
communicating anything. Granted, at that point we shouldn't yet have
done any work that might mess up the query results. Similarly, once
we reach that point, we are dependent on a certain amount of good
behavior for things to work as expected; yeah, any code that calls
proc_exit() is suppose to signal an ERROR or FATAL first, but what if
it doesn't? Granted, in that case we'd probably fail to send an 'X'
message, too, so the leader would still have a chance to realize
something is wrong.

I guess I agree with you to this extent: I made a policy decision that
if a worker is successfully fails to show up, that's an ERROR. It
would be possible to adopt the opposite policy, namely that if a
worker doesn't show up, that's an "oh well." You'd have to be very
certain that the worker wasn't going to show up later, though. For
instance, suppose you check all of the shared memory queues used for
returning tuples and find that every queue is either in a state where
(1) nobody's ever attached to it or (2) somebody attached and then
detached. This is not good enough, because it's possible that after
you checked queue #1, and found it in the former state, someone
attached and read a block, which caused queue #2 to enter the latter
state before you got around to checking it. If you decide that it's OK
to decide that we're done at this point, you'll never return the
tuples that are pushed through queue #1.

But, assuming you nailed the door shut so that such problems could not
occur, I think we could make a decision to ignore works that failed
before doing anything interesting. Whether that would be a good policy
decision is pretty questionable in my mind. In addition to what I
mentioned before, I think there's a serious danger that errors that
users would have really wanted to know about - or developers would
really want to have known about - would get ignored. You could have
some horrible problem that's making your workers fail to launch, and
the system would just carry on as if everything were fine, except with
bad query plans. I realize that you and others might say "oh, well,
monitor your logs, then," but I think there is certainly some value in
an ordinary user being able to know that things didn't go well without
having to look into the PostgreSQL log for errors. Now, maybe you
think that's not enough value to justify having it work the way it
does today, and I certainly respect that, but I don't view it that way
myself.

What I mostly want to emphasize here is that, while parallel query has
had a number of bugs in this area that were the result of shoddy
design or inadequate testing - principally by me - this isn't one of
them. This decision was made consciously by me because I thought it
gave us the best chance of having a system that would be reliable and
have satisfying behavior for users. Sounds like not everybody agrees,
and that's fine, but I just want to get it out there that this wasn't
accidental on my part.

> > And even if you solved for all of that, I think you might still find
> > that it breaks some parallel query (or parallel create index) code
> > that expects the number of workers to change at registration time, but
> > not afterwards. So, that could would all need to be adjusted.
>
> Fair enough. Although I think practically nearly everything has to be
> ready to handle workers just being slow to start up anyway, no? There's
> plenty cases where we just finish before all workers are getting around
> to do work.

Because of the shutdown race mentioned above, we generally have to
wait for workers to exit before we can shut down parallelism. See
commit
2badb5afb89cd569500ef7c3b23c7a9d11718f2f (whose commit message also
documents some of the behaviors now in question). So we tolerate slow
startup in that it doesn't prevent us from getting started on query
execution, but not to the extent that we can finish query execution
without knowing definitively that every worker is either already gone
or will never be showing up.

(Is it possible to do better there? Perhaps. If we could somehow throw
up a brick wall to prevent new workers from doing anything that would
cause problems, then verify that every worker which got past the brick
wall has exited cleanly, then we could ignore the risk of more workers
showing up later, because they'd hit the brick wall before causing any
trouble.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-11-04 20:12:05 Re: cost based vacuum (parallel)
Previous Message Peter Geoghegan 2019-11-04 19:52:14 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.