Quick Links

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Hubert Lubaczewski <depesz(at)depesz(dot)com>
Cc:	pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com
Date:	2017-11-28 07:23:41
Message-ID:	CAA4eK1LxyBgkFYxtoW-ZTaC_jkk6Ck8UB+3XHHT1a4EnZ52Bcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Nov 27, 2017 at 11:26 PM, hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
> On Mon, Nov 27, 2017 at 05:24:49PM +0530, Amit Kapila wrote:
>> I think it is "actual_time * 1" for anything below Gather.
>
> Well, I think I found another problem.
>
> Please take a look at:
> https://explain.depesz.com/s/Bs8c
>
> Node 15 is Gather with loops = 2974 (because it was ran by nested loop).
>
> There were 4 workers, so 5 threads working, but the loops on parallel
> seq scan is 17.698 million ?!
>

The number of loops displayed on parallel seq scan seems to be wrong, see below.

> The problem is that for explain.depesz.com I'm calculating how much time
> pg actually spent doing given thing.
>
> So, if an operation has actual time of 1ms, but 100 loops, it took
> 100ms.
>
> In case of parallel operations it looks like I can't realistically find
> the information anywhere?
>
> In the explain above, we see that Nested loop took, in total, around
> 2 million miliseconds (step 9).
> Out of which, ~ 7000 ms was hash join and it's subnodes.
> This leaves most of the time for Gather step, which was called 2974
> times with actual time reported as 618ms.
> 2974 * 618 is 1837932 miliseconds, which seems reasonable.
>
> But then - how much of this time was spent in Parallel Seq Scan in step
> #16 ?
>
> 2974 * 609ms?
>

Yeah.

>Why is loops there 17 million?
>

That is wrong and I think you have hit a bug. It should be 2974 * 5 =
14870 as you have seen in other cases. The problem is that during
rescan, we generally reinitialize the required state, but we forgot to
reinitialize the instrumentation related memory which is used in the
accumulation of stats, so changing that would fix some part of this
problem which is that at Parallel node, you won't see wrong values.
However, we also need to ensure that the per-worker details also get
accumulated across rescans. Attached patch should fix the problem you
are seeing. I think this needs some more analysis and testing to see
if everything works in the desired way.

Is it possible for you to test the attached patch and see if you are
still seeing any unexpected values?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
fix_accum_instr_parallel_workers_v1.patch	application/octet-stream	2.2 KB

In response to

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com at 2017-11-27 17:56:31 from hubert depesz lubaczewski

Responses

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com at 2017-11-28 12:27:32 from hubert depesz lubaczewski
Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com at 2017-11-28 16:12:47 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dilip Kumar	2017-11-28 07:32:56	Re: ERROR: too many dynamic shared memory segments
Previous Message	Rajkumar Raghuwanshi	2017-11-28 07:07:48	Re: [HACKERS] Partition-wise aggregation/grouping