Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Date: 2016-06-20 06:42:54
Message-ID: CAA4eK1L4fTEebm5dvVw3G04Ksv-ya+QinnuuC7+97Q6xa7ASFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 20, 2016 at 11:48 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> Hi all,
>
> My colleague noticed that the output of EXPLAIN ANALYZE doesn't work
> fine for parallel seq scan.
>
> postgres(1)=# explain analyze verbose select count(*) from
pgbench_accounts ;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=217018.55..217018.56 rows=1 width=8)
> (actual time=2640.015..2640.015 rows=1 loops=1)
> Output: count(*)
> -> Gather (cost=217018.33..217018.54 rows=2 width=8) (actual
> time=2639.064..2640.002 rows=3 loops=1)
> Output: (PARTIAL count(*))
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=216018.33..216018.34 rows=1
> width=8) (actual time=2632.714..2632.715 rows=1 loops=3)
> Output: PARTIAL count(*)
> Worker 0: actual time=2632.583..2632.584 rows=1 loops=1
> Worker 1: actual time=2627.517..2627.517 rows=1 loops=1
> -> Parallel Seq Scan on public.pgbench_accounts
> (cost=0.00..205601.67 rows=4166667 width=0) (actual
> time=0.042..1685.542 rows=3333333 loops=3)
> Worker 0: actual time=0.033..1657.486 rows=3457968
loops=1
> Worker 1: actual time=0.039..1702.979 rows=3741069
loops=1
> Planning time: 1.026 ms
> Execution time: 2640.225 ms
> (15 rows)
>
> For example, the above result shows,
> Parallel Seq Scan : actual rows = 3333333
> worker 0 : actual rows = 3457968
> worker 1 : actual rows = 3741069
> Summation of these is 10532370, but actual total rows is 10000000.
> I think that Parallel Seq Scan should show actual rows =
> 10000000(total rows) or actual rows = 2800963(rows collected by
> itself). (10000000 maybe better)
>

You have to read the rows at Parallel Seq Scan nodes as total count of
rows, but you have to consider the loops parameter as well.

>
> After spent time to investigate this behaviour, ISTM that the problem
> is nloops of Parallel Seq Scan.
> Parallel Seq Scan is done only once, but nloops is incremented to 3.
>

nloops here indicates, that it is done for 2 workers and a master backend.

> So its "actual rows" is calculated 3333333(10000000 / 3) at
explain.c:L1223.
>

Thats how it should be considered. You might want to compare the behaviour
with other cases where value of nloops is used.

> Is it a bug?
>

I don't think so.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-06-20 06:54:03 Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Previous Message Masahiko Sawada 2016-06-20 06:18:56 Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .