Re: Consider parallel for lateral subqueries with limit

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Consider parallel for lateral subqueries with limit
Date: 2020-12-01 13:43:38
Message-ID: CAAaqYe_ssUwJmYkdxO0oKqsrxPB0Ktndu7i5YiThjCor7+mqOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2020 at 7:00 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> I've been investigating parallelizing certain correlated subqueries,
> and during that work stumbled across the fact that
> set_rel_consider_parallel disallows parallel query on what seems like
> a fairly simple case.
>
> Consider this query:
>
> select t.unique1
> from tenk1 t
> join lateral (select t.unique1 from tenk1 offset 0) l on true;
>
> Current set_rel_consider_parallel sets consider_parallel=false on the
> subquery rel because it has a limit/offset. That restriction makes a
> lot of sense when we have a subquery whose results conceptually need
> to be "shared" (or at least be the same) across multiple workers
> (indeed the relevant comment in that function notes that cases where
> we could prove a unique ordering would also qualify, but punts on
> implementing that due to complexity). But if the subquery is LATERAL,
> then no such conceptual restriction.
>
> If we change the code slightly to allow considering parallel query
> even in the face of LIMIT/OFFSET for LATERAL subqueries, then our
> query above changes from the following plan:
>
> Nested Loop
> Output: t.unique1
> -> Gather
> Output: t.unique1
> Workers Planned: 2
> -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
> Output: t.unique1
> -> Gather
> Output: NULL::integer
> Workers Planned: 2
> -> Parallel Index Only Scan using tenk1_hundred on public.tenk1
> Output: NULL::integer
>
> to this plan:
>
> Gather
> Output: t.unique1
> Workers Planned: 2
> -> Nested Loop
> Output: t.unique1
> -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
> Output: t.unique1
> -> Index Only Scan using tenk1_hundred on public.tenk1
> Output: NULL::integer
>
> The code change itself is quite simple (1 line). As far as I can tell
> we don't need to expressly check parallel safety of the limit/offset
> expressions; that appears to happen elsewhere (and that makes sense
> since the RTE_RELATION case doesn't check those clauses either).
>
> If I'm missing something about the safety of this (or any other
> issue), I'd appreciate the feedback.

Note that near the end of grouping planner we have a similar check:

if (final_rel->consider_parallel && root->query_level > 1 &&
!limit_needed(parse))

guarding copying the partial paths from the current rel to the final
rel. I haven't managed to come up with a test case that exposes that
though since simple examples like the one above get converted into a
JOIN, so we're not in grouping_planner for a subquery. Making the
subquery above correlated results in us getting to that point, but
isn't currently marked as parallel safe for other reasons (because it
has params), so that's not a useful test. I'm not sure if there are
cases where we can't convert to a join but also don't involve params;
haven't thought about it a lot though.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-12-01 13:55:49 Re: Terminate the idle sessions
Previous Message Ashutosh Bapat 2020-12-01 13:17:50 Re: Cost overestimation of foreign JOIN