Re: Parallelize correlated subqueries that execute within each worker

From: James Coleman <jtc331(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallelize correlated subqueries that execute within each worker
Date: 2023-07-04 01:26:44
Message-ID: CAAaqYe8F=TNEF7Dk98dgLkksBt9T0QweTZ4kfif9+q5MKtxQ0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 11, 2023 at 10:23 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> ...
> > And while trying the v9 patch I came across a crash with the query
> > below.
> >
> > set min_parallel_table_scan_size to 0;
> > set parallel_setup_cost to 0;
> > set parallel_tuple_cost to 0;
> >
> > explain (costs off)
> > select * from pg_description t1 where objoid in
> > (select objoid from pg_description t2 where t2.description = t1.description);
> > QUERY PLAN
> > --------------------------------------------------------
> > Seq Scan on pg_description t1
> > Filter: (SubPlan 1)
> > SubPlan 1
> > -> Gather
> > Workers Planned: 2
> > -> Parallel Seq Scan on pg_description t2
> > Filter: (description = t1.description)
> > (7 rows)
> >
> > select * from pg_description t1 where objoid in
> > (select objoid from pg_description t2 where t2.description = t1.description);
> > WARNING: terminating connection because of crash of another server process
> >
> > Seems something is wrong when extracting the argument from the Param in
> > parallel worker.
>
> With what I'm trying to change I don't think this plan should ever be
> generated since it means we'd have to pass a param from the outer seq
> scan into the parallel subplan, which we can't do (currently).
>
> I've attached the full backtrace to the email, but as you hinted at
> the parallel worker is trying to get the param (in this case
> detoasting it), but the param doesn't exist on the worker, so it seg
> faults. Looking at this further I think there's an existing test case
> that exposes the misplanning here (the one right under the comment
> "Parallel Append is not to be used when the subpath depends on the
> outer param" in select_parallel.sql), but it doesn't seg fault because
> the param is an integer, doesn't need to be detoasted, and therefore
> (I think) we skate by (but probably with wrong results in depending on
> the dataset).
>
> Interestingly this is one of the existing test queries my original
> patch approach didn't change, so this gives me something specific to
> work with improving the path. Thanks for testing this and bringing
> this to my attention!

Here's what I've found debugging this:

There's only a single gather path ever created when planning this
query, making it easy to know which one is the problem. That gather
path is created with this stacktrace:

frame #0: 0x0000000105291590
postgres`create_gather_path(root=0x000000013081ae78,
rel=0x000000013080c8e8, subpath=0x000000013081c080,
target=0x000000013081c8c0, required_outer=0x0000000000000000,
rows=0x0000000000000000) at pathnode.c:1971:2
frame #1: 0x0000000105208e54
postgres`generate_gather_paths(root=0x000000013081ae78,
rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3097:4
frame #2: 0x00000001052090ec
postgres`generate_useful_gather_paths(root=0x000000013081ae78,
rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3241:2
frame #3: 0x0000000105258754
postgres`apply_scanjoin_target_to_paths(root=0x000000013081ae78,
rel=0x000000013080c8e8, scanjoin_targets=0x000000013081c978,
scanjoin_targets_contain_srfs=0x0000000000000000,
scanjoin_target_parallel_safe=true, tlist_same_exprs=true) at
planner.c:7696:3
frame #4: 0x00000001052533cc
postgres`grouping_planner(root=0x000000013081ae78, tuple_fraction=0.5)
at planner.c:1611:3
frame #5: 0x0000000105251e9c
postgres`subquery_planner(glob=0x00000001308188d8,
parse=0x000000013080caf8, parent_root=0x000000013080cc38,
hasRecursion=false, tuple_fraction=0.5) at planner.c:1062:2
frame #6: 0x000000010526b134
postgres`make_subplan(root=0x000000013080cc38,
orig_subquery=0x000000013080ff58, subLinkType=ANY_SUBLINK,
subLinkId=0, testexpr=0x000000013080d848, isTopQual=true) at
subselect.c:221:12
frame #7: 0x0000000105268b8c
postgres`process_sublinks_mutator(node=0x000000013080d6d8,
context=0x000000016b0998f8) at subselect.c:1950:10
frame #8: 0x0000000105268ad8
postgres`SS_process_sublinks(root=0x000000013080cc38,
expr=0x000000013080d6d8, isQual=true) at subselect.c:1923:9
frame #9: 0x00000001052527b8
postgres`preprocess_expression(root=0x000000013080cc38,
expr=0x000000013080d6d8, kind=0) at planner.c:1169:10
frame #10: 0x0000000105252954
postgres`preprocess_qual_conditions(root=0x000000013080cc38,
jtnode=0x000000013080d108) at planner.c:1214:14
frame #11: 0x0000000105251580
postgres`subquery_planner(glob=0x00000001308188d8,
parse=0x0000000137010d68, parent_root=0x0000000000000000,
hasRecursion=false, tuple_fraction=0) at planner.c:832:2
frame #12: 0x000000010525042c
postgres`standard_planner(parse=0x0000000137010d68,
query_string="explain (costs off)\nselect * from pg_description t1
where objoid in\n (select objoid from pg_description t2 where
t2.description = t1.description);", cursorOptions=2048,
boundParams=0x0000000000000000) at planner.c:411:9

There aren't any lateral markings on the rels. Additionally the
partial path has param_info=null (I found out from Tom in a separate
thread [1] that this is only set for outer relations from the same
query level).

The only param that I could easily find at first was a single param of
type PARAM_EXTERN in root->plan_params in make_subplan().

I spent a lot of time trying to figure out where we could find the
PARAM_EXEC param that's being fed into the subplan, but it doesn't
seem like we have access to any of these things at the point in the
path creation process that it's interesting to us when inserting the
gather nodes.

Given all of that I settled on this approach:
1. Modify is_parallel_safe() to by default ignore PARAM_EXEC params.
2. Add is_parallel_safe_with_params() that checks for the existence of
such params.
3. Store the required params in a bitmapset on each base rel.
4. Union the bitmapset on join rels.
5. Only insert a gather node if that bitmapset is empty.

I have an intuition that there's some spot (e.g. joins) that we should
be removing params from this set (e.g., when we've satisfied them),
but I haven't been able to come up with such a scenario as yet.

The attached v11 fixes the issue you reported.

Thanks,
James Coleman

Attachment Content-Type Size
v11-0002-Parallelize-correlated-subqueries.patch application/octet-stream 27.5 KB
v11-0001-Add-tests-before-change.patch application/octet-stream 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-07-04 01:30:56 RE: [PGdocs] fix description for handling pf non-ASCII characters
Previous Message Tom Lane 2023-07-04 01:20:29 Re: pgindent (probably my missing something obvious)