Re: parallelize queries containing subplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing subplans
Date: 2017-02-14 09:24:46
Message-ID: CAA4eK1LQA-avGD=H3nNfVC8nMahozgRzEHb3KPpeppCTQ-gayg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 4:19 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> Now, we can further extend this to parallelize queries containing
>> correlated subplans like below:
>>
>> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
>> QUERY PLAN
>> -------------------------------------------------------------
>> Seq Scan on t1 (cost=0.00..831049.09 rows=8395 width=12)
>> Filter: (SubPlan 1)
>> SubPlan 1
>> -> Seq Scan on t2 (cost=0.00..97.73 rows=493 width=4)
>> Filter: (i = t1.i)
>> (5 rows)
>>
>> In the above query, Filter on t2 (i = t1.i) generates Param node which
>> is a parallel-restricted node, so such queries won't be able to use
>> parallelism even with the patch. I think we can mark Params which
>> refer to same level as parallel-safe and I think we have this
>> information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
>> when we replace correlation vars (SS_replace_correlation_vars).
>>
>
> I have implemented the above idea which will allow same or immediate
> outer level PARAMS as parallel_safe. The results of above query after
> patch:
>
> postgres=# explain select * from t1 where t1.i in (select t2.i from t2
> where t2.i=t1.i);
> QUERY PLAN
> --------------------------------------------------------------------------
> Gather (cost=0.00..488889.88 rows=8395 width=12)
> Workers Planned: 1
> -> Parallel Seq Scan on t1 (cost=0.00..488889.88 rows=4938 width=12)
> Filter: (SubPlan 1)
> SubPlan 1
> -> Seq Scan on t2 (cost=0.00..97.73 rows=493 width=4)
> Filter: (i = t1.i)
> (7 rows)
>

On further evaluation, it seems this patch has one big problem which
is that it will allow forming parallel plans which can't be supported
with current infrastructure. For ex. marking immediate level params
as parallel safe can generate below type of plan:

Seq Scan on t1
Filter: (SubPlan 1)
SubPlan 1
-> Gather
Workers Planned: 1
-> Result
One-Time Filter: (t1.k = 0)
-> Parallel Seq Scan on t2

In this plan, we can't evaluate one-time filter (that contains
correlated param) unless we have the capability to pass all kind of
PARAM_EXEC param to workers. I don't want to invest too much time in
this patch unless somebody can see some way using current parallel
infrastructure to implement correlated subplans.

Note that still, the other patch [1] in this thread which implements
parallelism for uncorrelated subplan holds good.

[1] - https://www.postgresql.org/message-id/CAA4eK1J9mDZLcp-OskkdzAf_yT8W4dBSGL9E%3DkoEiJkdpVZsEA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2017-02-14 09:33:22 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Seki, Eiji 2017-02-14 09:13:30 Re: Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags