Re: Parallelize correlated subqueries that execute within each worker

From: James Coleman <jtc331(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallelize correlated subqueries that execute within each worker
Date: 2023-06-12 02:23:45
Message-ID: CAAaqYe-_TObm5KwmZLYXBJ3BJGh4cUZWM0v1mY1gWTMkRNQXDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 6, 2023 at 4:36 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
>
> On Mon, Jan 23, 2023 at 10:00 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>>
>> Which this patch we do in fact now see (as expected) rels with
>> non-empty lateral_relids showing up in generate_[useful_]gather_paths.
>> And the partial paths can now have non-empty required outer rels. I'm
>> not able to come up with a plan that would actually be caught by those
>> checks; I theorize that because of the few places we actually call
>> generate_[useful_]gather_paths we are in practice already excluding
>> those, but for now I've left these as a conditional rather than an
>> assertion because it seems like the kind of guard we'd want to ensure
>> those methods are safe.
>
>
> I'm trying to understand this part. AFAICS we will not create partial
> paths for a rel, base or join, if it has lateral references. So it
> seems to me that in generate_[useful_]gather_paths after we've checked
> that there are partial paths, the checks for lateral_relids are not
> necessary because lateral_relids should always be empty in this case.
> Maybe I'm missing something.

At first I was thinking "isn't the point of the patch to generate
partial paths for rels with lateral references" given what I'd written
back in January, but I added "Assert(bms_is_empty(required_outer));"
to both of those functions and the assertion never fails running the
tests (including my newly parallelizable queries). I'm almost positive
I'd checked this back in January (not only had I'd explicitly written
that I'd confirmed we had non-empty lateral_relids there, but also it
was the entire based of the alternate approach to the patch), but...I
can't go back to 5 months ago and remember what I'd done.

Ah! Your comment about "after we've checked that there are partial
paths" triggered a thought. I think originally I'd had the
"bms_is_subset(required_outer, rel->relids)" check first in these
functions. And indeed if I run the tests with that the assertion moved
to above the partial paths check, I get failures in
generate_useful_gather_paths specifically. Mystery solved!

> And while trying the v9 patch I came across a crash with the query
> below.
>
> set min_parallel_table_scan_size to 0;
> set parallel_setup_cost to 0;
> set parallel_tuple_cost to 0;
>
> explain (costs off)
> select * from pg_description t1 where objoid in
> (select objoid from pg_description t2 where t2.description = t1.description);
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on pg_description t1
> Filter: (SubPlan 1)
> SubPlan 1
> -> Gather
> Workers Planned: 2
> -> Parallel Seq Scan on pg_description t2
> Filter: (description = t1.description)
> (7 rows)
>
> select * from pg_description t1 where objoid in
> (select objoid from pg_description t2 where t2.description = t1.description);
> WARNING: terminating connection because of crash of another server process
>
> Seems something is wrong when extracting the argument from the Param in
> parallel worker.

With what I'm trying to change I don't think this plan should ever be
generated since it means we'd have to pass a param from the outer seq
scan into the parallel subplan, which we can't do (currently).

I've attached the full backtrace to the email, but as you hinted at
the parallel worker is trying to get the param (in this case
detoasting it), but the param doesn't exist on the worker, so it seg
faults. Looking at this further I think there's an existing test case
that exposes the misplanning here (the one right under the comment
"Parallel Append is not to be used when the subpath depends on the
outer param" in select_parallel.sql), but it doesn't seg fault because
the param is an integer, doesn't need to be detoasted, and therefore
(I think) we skate by (but probably with wrong results in depending on
the dataset).

Interestingly this is one of the existing test queries my original
patch approach didn't change, so this gives me something specific to
work with improving the path. Thanks for testing this and bringing
this to my attention!

BTW are you by any chance testing on ARM macOS? I reproduced the issue
there, but for some reason I did not reproduce the error (and the plan
wasn't parallelized) when I tested this on linux. Perhaps I missed
setting something up; it seems odd.

> BTW another rebase is needed as it no longer applies to HEAD.

Apologies; I'd rebased, but hadn't updated the thread. See attached
for an updated series (albeit still broken on your test query).

Thanks,
James

Attachment Content-Type Size
backtrace.txt text/plain 2.9 KB
v10-0001-Add-tests-before-change.patch application/octet-stream 7.2 KB
v10-0002-Parallelize-correlated-subqueries.patch application/octet-stream 30.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-06-12 02:44:00 Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Previous Message Tatsuo Ishii 2023-06-12 02:10:56 Re: make_ctags: use -I option to ignore pg_node_attr macro