|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>|
|Cc:||Georgios Kokolatos <gkokolatos(at)pm(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org|
|Subject:||Re: Duplicate Workers entries in some EXPLAIN plans|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> writes:
> Okay. Does not getting as many workers as it asks for include
> sometimes getting zero, completely changing the actual output?
Yup :-(. We could possibly avoid that by running the explain
test by itself rather than in a parallel group, but I don't
especially want to add yet more non-parallelizable tests.
Meanwhile, I spent some time looking at the code, and wasn't very happy
with it. I'm on board with the general plan of redirecting EXPLAIN
output into per-worker buffers that we eventually recombine, but I think
that this particular coding is pretty unmaintainable/unextensible.
In particular, I'm really unhappy that the code is ignoring EXPLAIN's
grouping_state stack. That means that when it's formatting fields that
belong to the per-worker group, it's using the state-stack entry that
corresponds to the plan node's main level. This seems to accidentally
work, but that fact depends on a number of shaky assumptions:
* Before messing with any per-worker data, we've always emitted at
least one field in the plan node's main level, so that the state-stack
entry isn't at its initial state for the level.
* Before actually emitting the shunted-aside data, we've always emitted
a "Worker Number" field in correct format within the per-worker group,
so that the formatting state is now correct for a non-initial field.
* There is no formatting difference between any non-first fields in
a level (ie the state stack entries are basically just booleans),
so that it doesn't matter how many plan-node fields we emitted before
starting the per-worker data, so long as there was at least one, nor
does transiently abusing the plan node level's stack entry like this
break the formatting of subsequent plan-node-level fields.
Now maybe we'll never implement an output format that breaks that
last assumption, and maybe we'll never rearrange the EXPLAIN code
in a way that breaks either of the first two. But I don't like those
assumptions too much. I also didn't like the code's assumption that
all the non-text formats interpret es->indent the same.
I also noted an actual bug, which is that the patch fails regression
testing under force_parallel_mode = regress. This isn't really your
fault, because the issue is in this obscure and poorly-commented hack
* You might think we should just skip this stanza entirely when
* es->hide_workers is true, but then we'd get no sort-method output at
* all. We have to make it look like worker 0's data is top-level data.
* Currently, we only bother with that for text-format output.
Nonetheless, it's broken.
So I spent some time hacking on this and came up with the attached.
It's noticeably more verbose than your patch, but it keeps the
output-format-aware code at what seems to me to be a maintainable
arm's-length distance from the parallel-worker hacking. TEXT is
still a special case of course :-(.
This patch just covers the code, I'm not taking any position yet
about what to do about the tests. I did tweak the code to eliminate
the one formatting difference in select_parallel (ie put two spaces
after "Worker N:", which I think reads better anyhow), so it
passes check-world as it stands.
regards, tom lane
|Next Message||Melanie Plageman||2020-01-25 02:22:30||Re: Avoiding hash join batch explosions with extreme skew and weird stats|
|Previous Message||Peter Geoghegan||2020-01-25 01:16:47||Re: Memory-Bounded Hash Aggregation|