Re: Consider parallel for lateral subqueries with limit

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, brian(at)brianlikespostgres(dot)com
Subject: Re: Consider parallel for lateral subqueries with limit
Date: 2022-01-13 23:26:01
Message-ID: CAAaqYe86SVE4HpuiqyjxV2Uui=5+hrF_25L_OBiDYbOsY7mXTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 4, 2022 at 9:59 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Tue, Jan 4, 2022 at 5:31 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Greg Nancarrow <gregn4422(at)gmail(dot)com> writes:
> > > The patch LGTM.
> > > I have set the status to "Ready for Committer".
> >
> > I don't really see why this patch is even a little bit safe.
> > The argument for it seems to be that a lateral subquery will
> > necessarily be executed in such a way that each complete iteration
> > of the subquery, plus joining to its outer rel, happens within a
> > single worker ... but where is the guarantee of that? Once
> > you've marked the rel as parallel-safe, the planner is free to
> > consider all sorts of parallel join structures. I'm afraid this
> > would be easily broken as soon as you look at cases with three or
> > more rels. Or maybe even just two. The reason for the existing
> > restriction boils down to this structure being unsafe:
> >
> > Gather
> > NestLoop
> > Scan ...
> > Limit
> > Scan ...
> >
> > and I don't see how the existence of a lateral reference
> > makes it any safer.
>
> Thanks for taking a look. I'm not following how the structure you
> posited is inherently unsafe when it's a lateral reference. That
> limit/scan (if lateral) has to be being executed per tuple in the
> outer scan, right? And if it's a unique execution per tuple, then
> consistency across tuples (that are in different workers) can't be a
> concern.
>
> Is there a scenario I'm missing where lateral can currently be
> executed in that way in that structure (or a different one)?

To expand on this:

Suppose lateral is not in play. Then if we have a plan like:

Gather
NestLoop
Scan X
Limit
Scan Y

Because we have the result "X join Limit(Y)" we need "Limit(Y)" to be
consistent across all of the possible executions of "Limit(Y)" (i.e.,
in each worker it executes in). That means (absent infrastructure for
guaranteeing a unique ordering) we obviously can't parallelize the
inner side of the join as the limit may be applied in different ways
in each worker's execution.

Now suppose lateral is in play. Then (given the same plan) instead of
our result being "X join Limit(Y)" the result is "X join Limit(Y sub
X)", that is, each row in X is joined to a unique invocation of
"Limit(Y)". In this case we are already conceivably getting different
results for each execution of the subquery "Limit(Y)" even if we're
not running those executions across multiple workers. Whether we've
optimized such a subquery into running a single execution per row in X
or have managed to optimize it in such a way that a single execution
of "Limit(Y)" may be shared by multiple rows in X makes no difference
because there are no relational guarantees that that is the case
(conceptually each row in X gets its own results for "Limit(Y)").

I've not been able to come up with a scenario where this doesn't hold
-- even if part of the join or the subquery execution happens in a
different worker. I believe that for there to be a parallel safety
problem here you'd need to have a given subquery execution for a row
in X be executed multiple times. Alternatively I've been trying to
reason about whether there might be a scenario where a 3rd rel is
involved (i.e., the inner rel is itself a join rel), but as far as I
can tell the moment we end up with a join structure such that the
lateral rel is the one with the limit we'd be back to being safe again
for the reasons claimed earlier.

If there's something obvious (or not so obvious) I'm missing, I'd
appreciate a counterexample, but I'm currently unable to falsify my
original claim that the lateral reference is a fundamental difference
that allows us to consider this parallel safe.

Thanks,
James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-01-13 23:58:27 Re: libpq compression (part 2)
Previous Message Zhihong Yu 2022-01-13 23:03:11 Re: null iv parameter passed to combo_init()