Re: Observations in Parallel Append

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Observations in Parallel Append
Date: 2018-01-03 04:11:22
Message-ID: CAA4eK1+HoY_bNi0RJ4SogCpv+twT=Eg7KdWG_+VgA2SgpVH3Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 27, 2017 at 12:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Dec 24, 2017 at 8:37 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> Okay, but is it appropriate to use the parallel_divisor? The
>> parallel_divisor means the contribution of all the workers (+
>> leader_contribution) whereas for non-partial paths there will be
>> always only the subset of workers which will operate on them.
>> Consider a case with one non-partial subpath and five partial subpaths
>> with six as parallel_divisor, now the current code will try to divide
>> the rows of non-partial subpath with respect to six workers. However,
>> in reality, there will always be one worker which will execute that
>> path.
>
> That's true, of course, but if five processes each return 0 rows and
> the sixth process returns 600 rows, the average number of rows per
> process is 100, not anything else.
>
> Here's one way to look at it. Suppose there is a table with 1000
> partitions. If we do a Parallel Append over a Parallel Seq Scan per
> partition, we will come up with a row estimate by summing the
> estimated row count across all partitions and dividing by the
> parallel_divisor. This will give us some answer. If we instead do a
> Parallel Append over a Seq Scan per partition, we should really come
> up with the *same* estimate. The only way to do that is to also
> divide by the parallel_divisor in this case.
>

The theory sounds good, but it looks like that the code doesn't behave
in the same way. To verify the theory, I and my colleague Dilip had
run few tests, the output of which is as below:

Test setup
-------------------
create table t(a int, b varchar) partition by range(a);
create table t1 partition of t for values from (0) to (100000);
create table t2 partition of t for values from (100000) to (200000);
create table t3 partition of t for values from (200000) to (300000);
create table t4 partition of t for values from (300000) to (400000);
insert into t values (generate_series(1,399999), repeat('x', 50));
analyze t;

Test
-------
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
set max_parallel_workers_per_gather=6;

postgres=# explain select * from t;
QUERY PLAN
------------------------------------------------------------------------------
Gather (cost=0.00..6476.94 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..6476.94 rows=235295 width=55)
-> Parallel Seq Scan on t1 (cost=0.00..1619.23 rows=58823 width=55)
-> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55)
(7 rows)

postgres=# alter table t1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain select * from t;
QUERY PLAN
------------------------------------------------------------------------------
Gather (cost=0.00..6888.70 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..6888.70 rows=208730 width=55)
-> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55)
-> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55)
(7 rows)

postgres=# alter table t2 set (parallel_workers=0);
ALTER TABLE
postgres=# alter table t3 set (parallel_workers=0);
ALTER TABLE
postgres=# alter table t4 set (parallel_workers=0);
ALTER TABLE
postgres=# explain select * from t;
QUERY PLAN
-----------------------------------------------------------------------
Gather (cost=0.00..4061.99 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..4061.99 rows=129032 width=55)
-> Seq Scan on t2 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t3 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t4 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55)
(7 rows)

The value of rows in Parallel Append is different for different cases:
(a) Row estimate for Parallel Append over a Parallel Seq Scan per
partition - 235295
(b) Row estimate for Parallel Append over a mix of Parallel Seq Scan
and Seq Scan per partition - 208730
(c) Row estimate for Parallel Append over a Seq Scan per partition - 129032

I think the reason for different estimates is that we use a different
value of parallel divisor for partial paths. If the row estimation
for partial paths uses the scaled value of parallel divisor, then we
get consistent results for row estimation. I have tried the below
change:

+ apath->path.rows += (subpath->rows * subpath_parallel_divisor /
+ parallel_divisor);

The results after the change:
postgres=# explain select * from t;
QUERY PLAN
------------------------------------------------------------------------------
Gather (cost=0.00..6476.94 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..6476.94 rows=129032 width=55)
-> Parallel Seq Scan on t1 (cost=0.00..1619.23 rows=58823 width=55)
-> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55)
(7 rows)

postgres=# alter table t1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain select * from t;
QUERY PLAN
-------------------------------------------------------------------------------
Gather (cost=0.00..6888.70 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..6888.70 rows=129032 width=55)
-> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55)
-> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55)
-> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55)
(7 rows)

postgres=# alter table t4 set (parallel_workers=0);
ALTER TABLE
postgres=# alter table t3 set (parallel_workers=0);
ALTER TABLE
postgres=# alter table t2 set (parallel_workers=0);
ALTER TABLE
postgres=# explain select * from t;
QUERY PLAN
----------------------------------------------------------------------
Gather (cost=0.00..4061.99 rows=399999 width=55)
Workers Planned: 3
-> Parallel Append (cost=0.00..4061.99 rows=129032 width=55)
-> Seq Scan on t2 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t3 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t4 (cost=0.00..2031.00 rows=100000 width=55)
-> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55)
(7 rows)

Note that after the change the row estimation (129032) for Parallel
Append is consistent for all the three kinds of cases.

Attached, please find the patch which fixes this issue (Thanks to
Dilip for helping me in identifying the above case and fix). I have
also modified the comment atop function
choose_next_subplan_for_worker() as discussed above. The change to
remove unnecessary inclusion of spin.h is still in a separate patch as
posted above.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_row_est_pa_v1.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-01-03 04:14:26 Re: copy_file_range is now a Linux kernel call
Previous Message Tom Lane 2018-01-03 04:08:05 Re: copy_file_range is now a Linux kernel call