Re: Duplicate Workers entries in some EXPLAIN plans

From: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Duplicate Workers entries in some EXPLAIN plans
Date: 2019-11-18 23:39:33
Message-ID: CAOtHd0Cux3rNRVNpWiSxLGWCZk=Uzb_n=5g1Q0LODqnin-dsaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2019 at 6:48 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Unfortunately I think the fix isn't all that trivial, due to the way we
> output the per-worker information at the end of ExplainNode(), by just
> dumping things into a string. It seems to me that a step in the right
> direction would be for ExplainNode() to create
> planstate->worker_instrument StringInfos, which can be handed to
> routines like show_sort_info(), which would print the per-node
> information into that, rather than directly dumping into
> es->output. Most of the current "Show worker detail" would be done
> earlier in ExplainNode(), at the place where we current display the
> "actual rows" bit.
>
> ISTM that should include removing the duplication fo the the contents of
> show_sort_info(), and probably also for the Gather, GatherMerge blocks
> (I've apparently skipped adding the JIT information to the latter, not
> sure if we ought to fix that in the stable branches).
>
> Any chance you want to take a stab at that?

It took me a while, but I did take a stab at it (thanks for your
off-list help). Attached is my patch that changes the structured
formats to merge sort worker output in with costs/timing/buffers
worker output. I have not touched any other worker output yet, since
it's not under a Workers group as far as I can tell (so it does not
exhibit the problem I originally reported). I can try to make further
changes here if the approach is deemed sound. I also have not touched
text output; above you had proposed

> Sort Method: external merge Disk: 4920kB
> Buffers: shared hit=682 read=10188, temp read=1415 written=2101
> Worker 0: actual time=130.058..130.324 rows=1324 loops=1
> Sort Method: external merge Disk: 5880kB
> Buffers: shared hit=337 read=3489, temp read=505 written=739
> Worker 1: actual time=130.273..130.512 rows=1297 loops=1
> Buffers: shared hit=345 read=3507, temp read=505 written=744
> Sort Method: external merge Disk: 5920kB

which makes sense to me, but I'd like to confirm this is the approach
we want before I add it to the patch.

This is my first C in close to a decade (and I was never much of a C
programmer to begin with), so please be gentle.

As Andres suggested off-list, I also changed the worker output to
order fields that also occur in the parent node in the same way as the
parent node.

I've also added a test for the patch, and because this is really an
EXPLAIN issue rather than a query feature issue, I added a
src/test/regress/sql/explain.sql for the test. I added a couple of
utility functions for munging json-formatted EXPLAIN plans into
something we can repeatably verify in regression tests (the functions
use json rather than jsonb to preserve field order). I have not added
this for YAML or XML (even though they should behave the same way),
since I'm not familiar with the the functions to manipulate those data
types in a similar way (if they exist). My hunch is due to the
similarity of structured formats, just testing JSON is enough, but I
can expand/adjust tests as necessary.

Attachment Content-Type Size
merge-explain-worker-output.patch text/x-patch 18.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-18 23:58:58 Re: Postgres on IBM z/OS 2.2.0 and 2.3.0
Previous Message Tom Lane 2019-11-18 23:09:05 Re: Invisible PROMPT2