Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2017-12-19 15:31:04
Message-ID: CA+TgmoYaqPQ5Uk5jdNGBdqeZHjMHw1TKEbGQLgOOfJuEV9ZFtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 19, 2017 at 5:01 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think it would have been much easier to fix this problem if we would
> have some way to differentiate whether the worker has stopped
> gracefully or not. Do you think it makes sense to introduce such a
> state in the background worker machinery?

I think it makes a lot more sense to just throw an ERROR when the
worker doesn't shut down cleanly, which is currently what happens in
nearly all cases. It only fails to happen for fork() failure and
other errors that happen very early in startup. I don't think there's
any point in trying to make this code more complicated to cater to
such cases. If fork() is failing, the fact that parallel query is
erroring out rather than running with fewer workers is likely to be a
good thing. Your principle concern in that situation is probably
whether your attempt to log into the machine and kill some processes
is also going to die with 'fork failure', and having PostgreSQL
consume every available process slot is not going to make that easier.
On the other hand, if workers are failing so early in startup that
they never attach to the error queue, then they're probably all
failing the same way and trying to cope with that problem in any way
other than throwing an error is going to result in parallelism being
silently disabled with no notification to the user, which doesn't seem
good to me either.

So basically I think it's right to treat these as error conditions,
not try to continue the work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-19 15:45:25 Re: Top-N sorts verses parallelism
Previous Message Robert Haas 2017-12-19 15:24:21 Re: access/parallel.h lacks PGDLLIMPORT