Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2018-01-24 23:37:57
Message-ID: CA+TgmoYHWmC_pv8RkGbjJ-E-JoNL=Ts34vnfAyOVHcdaTAXM7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2018 at 5:52 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> If we made the Gather node wait only for workers that actually reached
>> the Gather -- either by using a Barrier or by some other technique --
>> then this would be a lot less fragile. And the same kind of technique
>> would work for parallel CREATE INDEX.
>
> The use of a barrier has problems of its own [1], though, of which one
> is unique to the parallel_leader_participation=off case. I thought
> that you yourself agreed with this [2] -- do you?
>
> Another argument against using a barrier in this way is that it seems
> like way too much mechanism to solve a simple problem. Why should a
> client of parallel.h not be able to rely on nworkers_launched (perhaps
> only after "verifying it can be trusted")? That seem like a pretty
> reasonable requirement for clients to have for any framework for
> parallel imperative programming.

Well, I've been resisting that approach from the very beginning of
parallel query. Eventually, I hope that we're going to go in the
direction of changing our mind about how many workers parallel
operations use "on the fly". For example, if there are 8 parallel
workers available and 4 of them are in use, and you start a query (or
index build) that wants 6 but only gets 4, it would be nice if the
other 2 could join later after the other operation finishes and frees
some up. That, of course, won't work very well if parallel operations
are coded in such a way that the number of workers must be nailed down
at the very beginning.

Now maybe all that seems like pie in the sky, and perhaps it is, but I
hold out hope. For queries, there is another consideration, which is
that some queries may run with parallelism but actually finish quite
quickly - it's not desirable to make the leader wait for workers to
start when it could be busy computing. That's a lesser consideration
for bulk operations like parallel CREATE INDEX, but even there I don't
think it's totally negligible.

For both reasons, it's much better, or so it seems to me, if parallel
operations are coded to work with the number of workers that show up,
rather than being inflexibly tied to a particular worker count.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-24 23:46:59 Re: reducing isolation tests runtime
Previous Message Tom Lane 2018-01-24 23:37:18 Re: pgsql: Add parallel-aware hash joins.