Re: Parallelize correlated subqueries that execute within each worker

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallelize correlated subqueries that execute within each worker
Date: 2023-01-23 14:00:31
Message-ID: CAAaqYe9V+2TTwGDUu12NLu1tDf0SnwvYnTOrKz=uTh6cLx3McA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 21, 2023 at 10:07 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> ...
> While working through Tomas's comment about a conditional in the
> max_parallel_hazard_waker being guaranteed true I realized that in the
> current version of the patch the safe_param_ids tracking in
> is_parallel_safe isn't actually needed any longer. That seemed
> suspicious, and so I started digging, and I found out that in the
> current approach all of the tests pass with only the changes in
> clauses.c. I don't believe that the other changes aren't needed;
> rather I believe there isn't yet a test case exercising them, but I
> realize that means I can't prove they're needed. I spent some time
> poking at this, but at least with my current level of imagination I
> haven't stumbled across a query that would exercise these checks.

I played with this a good bit more yesterday, I'm now a good bit more
confident this is correct. I've cleaned up the patch; see attached for
v7.

Here's some of my thought process:
The comments in src/include/nodes/pathnodes.h:2953 tell us that
PARAM_EXEC params are used to pass values around from one plan node to
another in the following ways:
1. Values down into subqueries (for outer references in subqueries)
2. Up out of subqueries (for the results of a subplan)
3. From a NestLoop plan node into its inner relation (when the inner
scan is parameterized with values from the outer relation)

Case (2) is already known to be safe (we currently add these params to
safe_param_ids in max_parallel_hazard_walker when we encounter a
SubPlan node).

I also believe case (3) is already handled. We don't build partial
paths for joins when joinrel->lateral_relids is non-empty, and join
order calculations already require that parameterization here go the
correct way (i.e., inner depends on outer rather than the other way
around).

That leaves us with only case (1) to consider in this patch. Another
way of saying this is that this is really the only thing the
safe_param_ids tracking is guarding against. For params passed down
into subqueries we can further distinguish between init plans and
"regular" subplans. We already know that params from init plans are
safe (at the right level). So we're concerned here with a way to know
if the params passed to subplans are safe. We already track required
rels in ParamPathInfo, so it's fairly simple to do this test.

Which this patch we do in fact now see (as expected) rels with
non-empty lateral_relids showing up in generate_[useful_]gather_paths.
And the partial paths can now have non-empty required outer rels. I'm
not able to come up with a plan that would actually be caught by those
checks; I theorize that because of the few places we actually call
generate_[useful_]gather_paths we are in practice already excluding
those, but for now I've left these as a conditional rather than an
assertion because it seems like the kind of guard we'd want to ensure
those methods are safe.

The other other place that we actually create gather[_merge] paths is
gather_grouping_paths(), and there I've chosen to use assertions,
because the point at which grouping happens in planning suggests to me
that we shouldn't have lateral dependencies at that point. If someone
is concerned about that, I'd be happy to change those to conditionals
also.

James Coleman

Attachment Content-Type Size
v8-0001-Add-tests-before-change.patch application/octet-stream 7.2 KB
v8-0002-Parallelize-correlated-subqueries.patch application/octet-stream 30.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-01-23 14:07:03 Re: run pgindent on a regular basis / scripted manner
Previous Message Matthias van de Meent 2023-01-23 13:54:01 Re: Improving btree performance through specializing by key shape, take 2