Re: [HACKERS] parallel.c oblivion of worker-startup failures

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date: 2018-01-25 00:29:09
Message-ID: CAH2-WzkXtsfjfVyw89AOWSxLWG1QQaqkCrNa2axk5tmeSRg+zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2018 at 3:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, I've been resisting that approach from the very beginning of
> parallel query. Eventually, I hope that we're going to go in the
> direction of changing our mind about how many workers parallel
> operations use "on the fly". For example, if there are 8 parallel
> workers available and 4 of them are in use, and you start a query (or
> index build) that wants 6 but only gets 4, it would be nice if the
> other 2 could join later after the other operation finishes and frees
> some up.

That seems like a worthwhile high-level goal.

I remember looking into Intel Threading Building Blocks many years
ago, and seeing some interesting ideas there. According to Wikipedia,
"TBB implements work stealing to balance a parallel workload across
available processing cores in order to increase core utilization and
therefore scaling". The programmer does not operate in terms of an
explicit number of threads, and there are probably certain types of
problems that this has an advantage with.

That model also has its costs, though, and I don't think it's every
going to supplant a lower level approach. In an ideal world, you have
both things, because TBB's approach apparently has high coordination
overhead on many core systems.

> That, of course, won't work very well if parallel operations
> are coded in such a way that the number of workers must be nailed down
> at the very beginning.

But my whole approach to sorting is based on the idea that each worker
produces a roughly even amount of output to merge. I don't see any
scope to do better for parallel CREATE INDEX. (Other uses for parallel
sort are another matter, though.)

> Now maybe all that seems like pie in the sky, and perhaps it is, but I
> hold out hope. For queries, there is another consideration, which is
> that some queries may run with parallelism but actually finish quite
> quickly - it's not desirable to make the leader wait for workers to
> start when it could be busy computing. That's a lesser consideration
> for bulk operations like parallel CREATE INDEX, but even there I don't
> think it's totally negligible.

Since I don't have to start this until the leader stops participating
as a worker, there is no wait in the leader. In the vast majority of
cases, a call to something like WaitForParallelWorkersToAttach() ends
up looking at state in shared memory, immediately determining that
every launched process initialized successfully. The overhead should
be negligible in the real world.

> For both reasons, it's much better, or so it seems to me, if parallel
> operations are coded to work with the number of workers that show up,
> rather than being inflexibly tied to a particular worker count.

I've been clear from day one that my approach to parallel tuplesort
isn't going to be that useful to parallel query in its first version.
You need some kind of partitioning (a distribution sort of some kind)
for that, and probably plenty of cooperation from within the executor.
I've also said that I don't think we can do much better for parallel
CREATE INDEX even *with* support for partitioning, which is something
borne out by comparisons with other systems. My patch was always
presented as an 80/20 solution.

I have given you specific technical reasons why I think that using a
barrier is at least a bad idea for nbtsort.c, and probably for
nodeGather.c, too. Those problems will need to be worked through if
you're not going to concede the point on using a barrier. Your
aspirations around not assuming that workers cannot join later seem
like good ones, broadly speaking, but they are not particularly
applicable to how *anything* happens to work now.

Besides all this, I'm not even suggesting that I need to know the
number of workers up front for parallel CREATE INDEX. Perhaps
nworkers_launched can be incremented after the fact following some
later enhancement to the parallel infrastructure, in which case
parallel CREATE INDEX will theoretically be prepared to take advantage
right away (though other parallel sort operations seem more likely to
*actually* benefit). That will be a job for the parallel
infrastructure, though, not for each and every parallel operation --
how else could we possibly hope to add more workers that become
available half way through, as part of a future enhancement to the
parallel infrastructure? Surely every caller to
CreateParallelContext() should not need to invent their own way of
doing this.

All I want is to be able to rely on nworkers_launched. That's not in
tension with this other goal/aspiration, and actually seems to
complement it.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-01-25 00:39:51 Re: [HACKERS] SERIALIZABLE with parallel query
Previous Message Tom Lane 2018-01-24 23:46:59 Re: reducing isolation tests runtime