Re: parallelize queries containing subplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing subplans
Date: 2017-02-15 06:07:45
Message-ID: CAA4eK1KqSnq5GFvNJC6gZkfnBNxGfnZq==NDeJFDskitncV=zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 4:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Feb 14, 2017 at 4:24 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On further evaluation, it seems this patch has one big problem which
>> is that it will allow forming parallel plans which can't be supported
>> with current infrastructure. For ex. marking immediate level params
>> as parallel safe can generate below type of plan:
>>
>> Seq Scan on t1
>> Filter: (SubPlan 1)
>> SubPlan 1
>> -> Gather
>> Workers Planned: 1
>> -> Result
>> One-Time Filter: (t1.k = 0)
>> -> Parallel Seq Scan on t2
>>
>>
>> In this plan, we can't evaluate one-time filter (that contains
>> correlated param) unless we have the capability to pass all kind of
>> PARAM_EXEC param to workers. I don't want to invest too much time in
>> this patch unless somebody can see some way using current parallel
>> infrastructure to implement correlated subplans.
>
> I don't think this approach has much chance of working; it just seems
> too simplistic. I'm not entirely sure what the right approach is.
> Unfortunately, the current query planner code seems to compute the
> sets of parameters that are set and used quite late, and really only
> on a per-subquery level.
>

Now just for the sake of discussion consider we have list of
allParams at the path level, then also I think it might not be easy to
make it work.

> Here we need to know whether there is
> anything that's set below the Gather node and used above it, or the
> other way around, and we need to know it much earlier, while we're
> still doing path generation.
>

Yes, that is exactly the challenge. I am not sure if currently there
is a way by which we can identify if a Param on a particular node
refers to node below it or above it.

> (There's also possible a couple of other cases, like an initPlan that
> needs to get executed only once, and also maybe a case where a
> parameter is set below the Gather and later used above the Gather.
> Not sure if that latter one happen, or how to deal with it.)
>

I think the case for initPlan is slightly different because we can
always evaluate it at Gather (considering it is an uncorrelated
initplan) and then pass it to Workers. We generally have a list of
all the params at each plan node, so we can identify which of these
are initPlan params and then evaluate them. Now, it can be used
irrespective of whether it is used above or below the Gather node.
For the cases, where it can be used above Gather node, it will work as
we always store the computed value of such params in estate/econtext
and for the cases when it has to be used below Gather, we need to pass
the computed value to workers. Now, there is some exceptions like for
few cases not all the params are available at a particular node, but I
feel those can be handled easily by either traversing the planstate
tree or by actually storing them at Gather node. Actually, in short,
this is what is done in the patch proposed for parallizing initplans
[1].

[1] - https://commitfest.postgresql.org/13/997/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-02-15 06:11:02 Re: Transactions involving multiple postgres foreign servers
Previous Message David Fetter 2017-02-15 05:43:27 Re: CREATE TABLE with parallel workers, 10.0?