Re: parallel.c is not marked as test covered

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Clément Prévost <prevostclement(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel.c is not marked as test covered
Date: 2016-06-20 16:06:52
Message-ID: CA+TgmoZfTM0H9mQm2T0hQ7pORstfyJvfrELeBvZpqo7Uv8t9Tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> although I fear we
>> might be getting to a level of tinkering with parallel query that
>> starts to look more like feature development.
>
> Personally, I'm +1 for such tinkering if it makes the feature either more
> controllable or more understandable. After reading the comments at the
> head of nodeGather.c, though, I don't think that single_copy is either
> understandable or useful, and merely renaming it won't help. Apparently,
> it runs code in the worker, except when it doesn't, and even when it does,
> it's absolutely guaranteed to be a performance loss because the leader is
> doing nothing. What in the world is the point?

The single_copy flag allows a Gather node to have a child plan which
is not intrinsically parallel. For example, consider these two plans:

Gather
-> Parallel Seq Scan

Gather
-> Seq Scan

The first plan is safe regardless of the setting of the single-copy
flag. If the plan is executed in every worker, the results in
aggregate across all workers will add up to the results of a
non-parallel sequential scan of the table. The second plan is safe
only if the # of workers is 1 and the single-copy flag is set. If
either of those things is not true, then more than one process might
try to execute the sequential scan, and the result will be that you'll
get N copies of the output, where N = (# of parallel workers) +
(leader also participates ? 1 : 0).

For force_parallel_mode = {on, regress}, the single-copy behavior is
essential. We can run all of those plans inside a worker, but only
because we know that the leader won't also try to run those same
plans.

But it might be useful in other cases too. For example, imagine a
plan like this:

Join
-> Join
-> Join
-> Join
-> Gather (single copy)
-> Join
-> Join
-> Join
-> Join
-> Scan (not parallel aware)

This is pipeline parallelism. Instead of having one process do all of
the joins, you can have a worker do some subset of them and the send
the outputs back to the leader which can do the rest and return the
results to the client. This is actually kind of hard to get right -
according to the literature I've read on parallel query - because you
can get pipeline stalls that erase most or all of the benefit, but
it's a possible area to explore.

Actually, though, the behavior I really want the single_copy flag to
embody is not so much "only one process runs this" but "leader does
not participate unless there are no workers", which is the same thing
only when the budgeted number of workers is one. This is useful
because of plans like this:

Finalize HashAggregate
-> Gather
-> Partial HashAggregate
-> Hash Join
-> Parallel Seq Scan on large_table
-> Hash
-> Seq Scan on another_large_table

Unless the # of groups is very small, the leader actually won't
perform very much of the parallel-seq-scan on large_table, because
it'll be too busy aggregating the results from the other workers.
However, if it ever reaches a point where the Gather can't read a
tuple from one of the workers immediately, which is almost certain to
occur right at the beginning of execution, it's going to go build a
copy of the hash table so that it can "help" with the hash join. By
the time it finishes, the workers will have done the same and be
feeding it results, and it will likely get little use out of the copy
that it built itself. But it will still have gone to the effort of
building it.

For 10.0, Thomas Munro has already done a bunch of work, and will be
doing more work, so that we can build a shared hash table, rather than
one copy per worker. That's going to be better when the table is
large anyway, so maybe this particular case won't matter so much. But
in general when a partial path has a substantial startup cost, it may
be better for the leader not to get involved. In a case like this,
it's hard to see how the leader's involvement can ever hurt:

Finalize HashAggregate
-> Gather
-> Partial HashAggregate
-> Nested Loop
-> Parallel Seq Scan on large_table
-> Index Scan on some_other_table

Even if the leader only processes only one or two pages of
large_table, there's no real harm done unless, I suppose, the combine
function is fabulously expensive, which seems unlikely. The lack of
harm stems directly from the fact that there's no startup cost here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-06-20 16:07:27 Re: 10.0
Previous Message Mark Dilger 2016-06-20 15:53:55 Re: 10.0