Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .

From: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Date: 2016-06-20 07:06:04
Message-ID: CAA8sozfo+wveFciCa3ZJ2X7pG9feDZQ8s+eqw1_NjbWFXsRXEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2016-06-20 15:42 GMT+09:00 Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>:
> On Mon, Jun 20, 2016 at 11:48 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> Hi all,
>>
>> My colleague noticed that the output of EXPLAIN ANALYZE doesn't work
>> fine for parallel seq scan.
>>
>> postgres(1)=# explain analyze verbose select count(*) from
>> pgbench_accounts ;
>> QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------
>> Finalize Aggregate (cost=217018.55..217018.56 rows=1 width=8)
>> (actual time=2640.015..2640.015 rows=1 loops=1)
>> Output: count(*)
>> -> Gather (cost=217018.33..217018.54 rows=2 width=8) (actual
>> time=2639.064..2640.002 rows=3 loops=1)
>> Output: (PARTIAL count(*))
>> Workers Planned: 2
>> Workers Launched: 2
>> -> Partial Aggregate (cost=216018.33..216018.34 rows=1
>> width=8) (actual time=2632.714..2632.715 rows=1 loops=3)
>> Output: PARTIAL count(*)
>> Worker 0: actual time=2632.583..2632.584 rows=1 loops=1
>> Worker 1: actual time=2627.517..2627.517 rows=1 loops=1
>> -> Parallel Seq Scan on public.pgbench_accounts
>> (cost=0.00..205601.67 rows=4166667 width=0) (actual
>> time=0.042..1685.542 rows=3333333 loops=3)
>> Worker 0: actual time=0.033..1657.486 rows=3457968
>> loops=1
>> Worker 1: actual time=0.039..1702.979 rows=3741069
>> loops=1
>> Planning time: 1.026 ms
>> Execution time: 2640.225 ms
>> (15 rows)
>>
>> For example, the above result shows,
>> Parallel Seq Scan : actual rows = 3333333
>> worker 0 : actual rows = 3457968
>> worker 1 : actual rows = 3741069
>> Summation of these is 10532370, but actual total rows is 10000000.
>> I think that Parallel Seq Scan should show actual rows =
>> 10000000(total rows) or actual rows = 2800963(rows collected by
>> itself). (10000000 maybe better)
>>
>
> You have to read the rows at Parallel Seq Scan nodes as total count of rows,
> but you have to consider the loops parameter as well.

IMHO, "actual rows" of "Parallel Seq Scan" should not be divided by the loops,
because the total rows processed here is 10000000, not 3333333 * 3.
I think the actual row value shown here "3333333 " is a bit confusing and tricky
for users.

>> After spent time to investigate this behaviour, ISTM that the problem
>> is nloops of Parallel Seq Scan.
>> Parallel Seq Scan is done only once, but nloops is incremented to 3.
>
> nloops here indicates, that it is done for 2 workers and a master backend.

Right, but I'm not sure "loops" is an appropriate word for this context.

>> So its "actual rows" is calculated 3333333(10000000 / 3) at
>> explain.c:L1223.
>>
>
> Thats how it should be considered. You might want to compare the behaviour
> with other cases where value of nloops is used.
>
>> Is it a bug?
>>
>
> I don't think so.

I would like to propose a few modification for parallel queries to
make it more clear.

>> -> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..205601.67 rows=4166667 width=0) (actual time=0.042..1685.542 rows=10000000 workers=2)
>> Parent: actual time=0.033..1657.486 rows=2800963loops=1
>> Worker 0: actual time=0.033..1657.486 rows=3457968 loops=1
>> Worker 1: actual time=0.039..1702.979 rows=3741069 loops=1

How about this?

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-06-20 07:06:09 Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Previous Message Masahiko Sawada 2016-06-20 06:54:03 Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .