Re: Duplicate Workers entries in some EXPLAIN plans

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Cc: Georgios Kokolatos <gkokolatos(at)pm(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Duplicate Workers entries in some EXPLAIN plans
Date: 2020-01-25 02:14:55
Message-ID: 15300.1579918495@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> writes:
> Okay. Does not getting as many workers as it asks for include
> sometimes getting zero, completely changing the actual output?

Yup :-(. We could possibly avoid that by running the explain
test by itself rather than in a parallel group, but I don't
especially want to add yet more non-parallelizable tests.

Meanwhile, I spent some time looking at the code, and wasn't very happy
with it. I'm on board with the general plan of redirecting EXPLAIN
output into per-worker buffers that we eventually recombine, but I think
that this particular coding is pretty unmaintainable/unextensible.
In particular, I'm really unhappy that the code is ignoring EXPLAIN's
grouping_state stack. That means that when it's formatting fields that
belong to the per-worker group, it's using the state-stack entry that
corresponds to the plan node's main level. This seems to accidentally
work, but that fact depends on a number of shaky assumptions:

* Before messing with any per-worker data, we've always emitted at
least one field in the plan node's main level, so that the state-stack
entry isn't at its initial state for the level.

* Before actually emitting the shunted-aside data, we've always emitted
a "Worker Number" field in correct format within the per-worker group,
so that the formatting state is now correct for a non-initial field.

* There is no formatting difference between any non-first fields in
a level (ie the state stack entries are basically just booleans),
so that it doesn't matter how many plan-node fields we emitted before
starting the per-worker data, so long as there was at least one, nor
does transiently abusing the plan node level's stack entry like this
break the formatting of subsequent plan-node-level fields.

Now maybe we'll never implement an output format that breaks that
last assumption, and maybe we'll never rearrange the EXPLAIN code
in a way that breaks either of the first two. But I don't like those
assumptions too much. I also didn't like the code's assumption that
all the non-text formats interpret es->indent the same.

I also noted an actual bug, which is that the patch fails regression
testing under force_parallel_mode = regress. This isn't really your
fault, because the issue is in this obscure and poorly-commented hack
in show_sort_info:

* You might think we should just skip this stanza entirely when
* es->hide_workers is true, but then we'd get no sort-method output at
* all. We have to make it look like worker 0's data is top-level data.
* Currently, we only bother with that for text-format output.

Nonetheless, it's broken.

So I spent some time hacking on this and came up with the attached.
It's noticeably more verbose than your patch, but it keeps the
output-format-aware code at what seems to me to be a maintainable
arm's-length distance from the parallel-worker hacking. TEXT is
still a special case of course :-(.

This patch just covers the code, I'm not taking any position yet
about what to do about the tests. I did tweak the code to eliminate
the one formatting difference in select_parallel (ie put two spaces
after "Worker N:", which I think reads better anyhow), so it
passes check-world as it stands.

Thoughts?

regards, tom lane

Attachment Content-Type Size
merge-explain-worker-output-v6.patch text/x-diff 23.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2020-01-25 02:22:30 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Peter Geoghegan 2020-01-25 01:16:47 Re: Memory-Bounded Hash Aggregation