Re: Rename max_parallel_degree?

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rename max_parallel_degree?
Date: 2016-04-30 08:54:09
Message-ID: CAKJS1f8_3y6XHp2xP5YOvPetXMuCmiWP6zWR_eteRQy2fCn9mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 April 2016 at 02:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 27, 2016 at 1:05 PM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
>> Robert Haas wrote:
>>> Of course, we could make this value 1-based rather than 0-based, as
>>> Peter Geoghegan suggested a while back. But as I think I said at the
>>> time, I think that's more misleading than helpful. The leader
>>> participates in the parallel plan, but typically does far less of the
>>> work beneath the Gather node than the other nodes involved in the
>>> query, often almost none. In short, the leader is special.
>>> Pretending that it's just another process involved in the parallel
>>> group isn't doing anyone a favor.
>>
>> FWIW, that's not how it looks from the outside (top or vmstat).
>> I'm ignorant about how parallel tasks are assigned in the planner,
>> but when trying various values for max_parallel_degree and running
>> simple aggregates on large tables on a single 4 core CPU doing
>> nothing else, I'm only ever seeing max_parallel_degree+1 processes
>> indiscriminately at work, often in the same state (R running or
>> D waiting for disk).
>
> Right, but they're probably not doing the SAME work. You can look at
> EXPLAIN (ANALYZE, VERBOSE, BUFFERS) to see. Of course, all the work
> above the Gather node is being done by the leader, but the stuff below
> the Gather node often has a bit of participation from the leader, but
> is mostly the workers.

Robert, I'd imagine that most of your tests to make you think what you
do would have come from testing parallel seq scan, where perhaps
Daniel's comes from testing something like parallel aggregates, or at
least something that gives the workers a decent amount of work per
tuple returned.

With the setting;

# set max_parallel_degree = 8;

Given a table like;

# create table t1 (num int not null);

Populated with;
# insert into t1 select generate_Series(1,10000000);

Given the query;

# explain (analyze, verbose) select count(*) from t1;

if we think about what'll happen here, each worker will go off and
grab all of the tuples it can and aggregate each one, meanwhile the
main process would otherwise be quite idle waiting for the workers to
come back with their partially aggregated results, so instead, to keep
itself busy, goes off and helps them out. We can see this is true in
the explain analyze verbose output;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=80508.54..80508.55 rows=1 width=8) (actual
time=605.019..605.019 rows=1 loops=1)
Output: pg_catalog.count(*)
-> Gather (cost=80508.13..80508.53 rows=4 width=8) (actual
time=604.799..605.011 rows=5 loops=1)
Output: (count(*))
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=79508.13..79508.13 rows=1
width=8) (actual time=585.099..585.100 rows=1 loops=5)
Output: count(*)
Worker 0: actual time=579.736..579.736 rows=1 loops=1
Worker 1: actual time=580.669..580.669 rows=1 loops=1
Worker 2: actual time=580.512..580.513 rows=1 loops=1
Worker 3: actual time=580.649..580.649 rows=1 loops=1
-> Parallel Seq Scan on public.t1
(cost=0.00..72456.10 rows=2820810 width=0) (actual time=2.310..404.978
rows=2000000 loops=5)
Output: num
Worker 0: actual time=2.231..397.251 rows=1892702 loops=1
Worker 1: actual time=3.107..403.436 rows=1983602 loops=1
Worker 2: actual time=3.030..403.082 rows=1952188 loops=1
Worker 3: actual time=3.135..404.756 rows=2039650 loops=1

If we look at total the number of rows that each of the workers
managed to chew through, and subtract from the total rows in t1;

# select 10000000 - (1892702 + 1983602 + 1952188 + 2039650);
?column?
----------
2131858
(1 row)

So the main process managed to get through 2131858 rows while waiting
for the helpers finishing their work. Daniel looks right.

Another example, this time no aggregation;

postgres=# explain (analyze,verbose) select * from t1 where num > 9500000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..126718.15 rows=502200 width=4) (actual
time=454.071..694.967 rows=500000 loops=1)
Output: num
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.t1 (cost=0.00..76753.65
rows=125550 width=4) (actual time=430.240..451.990 rows=100000
loops=5)
Output: num
Filter: (t1.num > 9500000)
Rows Removed by Filter: 1900000
Worker 0: actual time=421.220..449.077 rows=130128 loops=1
Worker 1: actual time=428.549..456.059 rows=125430 loops=1
Worker 2: actual time=421.372..447.751 rows=113904 loops=1
Worker 3: actual time=427.140..454.118 rows=130402 loops=1

# select 500000 - (130128 + 125430 + 113904 + 130402);
?column?
----------
136
(1 row)

The main process only managed to get time for 136 rows! Robert is right.

So I'd say this very much depends on how busy the main process is
pulling rows from each worker.

It would also be quite nice if we could see at a glance how much the
main process did, without having to go subtracting what all the
workers managed to do.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-04-30 14:55:31 Re: Support for N synchronous standby servers - take 2
Previous Message Andreas Seltenreich 2016-04-30 08:21:23 Re: [sqlsmith] Failed assertion in BecomeLockGroupLeader