Re: parallelize queries containing subplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing subplans
Date: 2017-01-12 02:58:01
Message-ID: CAA4eK1Kfe0NrCRncGmqYvvhBpr8WqQtgfPBnbcZk=AtU4ixWXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2017 at 10:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 28, 2016 at 1:17 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Currently, queries that have references to SubPlans or
>> AlternativeSubPlans are considered parallel-restricted. I think we
>> can lift this restriction in many cases especially when SubPlans are
>> parallel-safe. To make this work, we need to propagate the
>> parallel-safety information from path node to plan node and the same
>> could be easily done while creating a plan. Another option could be
>> that instead of propagating parallel-safety information from path to
>> plan, we can find out from the plan if it is parallel-safe (doesn't
>> contain any parallel-aware node) by traversing whole plan tree, but I
>> think it is a waste of cycles. Once we have parallel-safety
>> information in the plan, we can use that for detection of
>> parallel-safe expressions in max_parallel_hazard_walker(). Finally,
>> we can pass all the subplans to workers during plan serialization in
>> ExecSerializePlan(). This will enable workers to execute subplans
>> that are referred in parallel part of the plan. Now, we might be able
>> to optimize it such that we pass only subplans that are referred in
>> parallel portion of plan, but I am not sure if it is worth the trouble
>> because it is one-time cost and much lesser than other things we do
>> (like creating
>> dsm, launching workers).
>
> It seems unfortunate to have to add a parallel_safe flag to the
> finished plan; the whole reason we have the Path-Plan distinction is
> so that we can throw away information that won't be needed at
> execution time. The parallel_safe flag is, in fact, not needed at
> execution time, but just for further planning. Isn't there some way
> that we can remember, at the time when a sublink is converted to a
> subplan, whether or not the subplan was created from a parallel-safe
> path?
>

The other alternative is to remember this information in SubPlan. We
can retrieve parallel_safe information from best_path and can use it
while generating SubPlan. The main reason for storing it in the plan
was to avoid explicitly passing parallel_safe information while
generating SubPlan as plan was already available at that time.
However, it seems there are only two places in code (refer
build_subplan) where this information needs to be propagated. Let me
know if you prefer to remember the parallel_safe information in
SubPlan instead of in Plan or if you have something else in mind?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 高增琦 2017-01-12 03:05:31 Re: Do we support using agg or window functions in delete statement?
Previous Message Paul Ramsey 2017-01-12 02:29:16 Re: Retiring from the Core Team