Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com
Date: 2017-12-05 11:02:00
Message-ID: CAA4eK1JBj4YCEQKeTua5=BMXy7zW7zNOvoXomzBP=kb_aqMF7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2017 at 1:29 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Tue, Dec 5, 2017 at 8:49 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Tue, Dec 5, 2017 at 6:39 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> I have one another observation in the somewhat related area. From the
>>> code, it looks like we might have some problem with displaying sort
>>> info for workers for rescans. I think the problem with the sortinfo
>>> is that it initializes shared info with local memory in
>>> ExecSortRetrieveInstrumentation after which it won't be able to access
>>> the values in shared memory changed by workers in rescans. We might
>>> be able to fix it by having some local_info same as sahred_info in
>>> sort node. But the main problem is how do we accumulate stats for
>>> workers across rescans. The type of sort method can change across
>>> rescans. We might be able to accumulate the size of Memory though,
>>> but not sure if that is right. I think though this appears to be
>>> somewhat related to the problem being discussed in this thread, it can
>>> be dealt separately if we want to fix it.
>>
>> Yeah, that's broken. ExecSortRetrieveInstrumentation() is run for
>> each loop, and after the first loop we've lost track of the pointer
>> into shared memory because we replaced it with palloc'd copy. We
>> could do what you said, or we could reinstate the pointer into the DSM
>> in ExecSortReInitializeDSM() by looking it up in the TOC.
>
> Or would it be an option to change the time
> ExecXXXRetrieveInstrumentation() is called so it is run only once?
>

To me, that doesn't sound like a bad option. I think if do so, then
we don't even need to reinitialize the shared sort stats. I think
something, like attached, should work if we want to go this route. We
can add regression test if this is what we think is a good idea.
Having said that, one problem I see doing thing this way is that in
general, we will display the accumulated stats of each worker, but for
sort or some other special nodes (like hash), we will show the
information of only last loop. I am not sure if that is a matter of
concern, but if we want to do this way, then probably we should
explain this in documentation as well.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_accum_instr_parallel_workers_v4.patch application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-05 11:53:22 Re: [HACKERS] kqueue
Previous Message Amit Kapila 2017-12-05 10:51:27 Re: Usage of epoch in txid_current