Re: parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-08-09 11:26:44
Message-ID: CAA4eK1+UpCfd-zO5o8Vo0ULk-KqRc_+wU3VtMCCCAGFYFTM3bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 9, 2017 at 10:24 AM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
>
>
> On Mon, Jul 17, 2017 at 10:53 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Tue, Mar 28, 2017 at 7:25 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
>> > <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>> >> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> >> wrote:
>> >>> Based on that idea, I have modified the patch such that it will
>> >>> compute the set of initplans Params that are required below gather
>> >>> node and store them as bitmap of initplan params at gather node.
>> >>> During set_plan_references, we can find the intersection of external
>> >>> parameters that are required at Gather or nodes below it with the
>> >>> initplans that are passed from same or above query level. Once the set
>> >>> of initplan params are established, we evaluate those (if they are not
>> >>> already evaluated) before execution of gather node and then pass the
>> >>> computed value to each of the workers. To identify whether a
>> >>> particular param is parallel safe or not, we check if the paramid of
>> >>> the param exists in initplans at same or above query level. We don't
>> >>> allow to generate gather path if there are initplans at some query
>> >>> level below the current query level as those plans could be
>> >>> parallel-unsafe or undirect correlated plans.
>> >>
>> >> I would like to mention different test scenarios with InitPlans that
>> >> we've considered while developing and testing of the patch.
>> >>
>> >
>> > Thanks a lot Kuntal for sharing different test scenarios.
>> > Unfortunately, this patch doesn't received any review till now, so
>> > there is no chance of making it in to PostgreSQL-10. I have moved
>> > this to next CF.
>> >
>>
>> Attached is a rebased version of the patch with below changes:
>> a. SubplanState now directly stores Subplan rather than ExprState, so
>> patch needs some adjustment in that regard.
>> b. While rejecting the paths (based on if there are initplans at level
>> below the current query level) for parallelism, the rejected paths
>> were not marked as parallel unsafe. Due to this in
>> force_parallel_mode=regress, we were able to add gather node above
>> parallel unsafe paths. The modified patch ensures to mark such paths
>> as parallel unsafe.
>> c. Added regression test.
>> d. Improve comments in the code.
>>
>
> I tested the latest patch and the parallel plan is getting choose for most
> of
> the init plans.
>

Thanks for looking into this patch.

> For the following query the parallel plan is not chosen. The query contains
> an init plan that refer the outer node.
>

We don't want to generate the parallel plan for such cases. Whenever
initplan refers to any outer node (aka correlated plan), it won't
generate a parallel plan. Also, for t2, it doesn't choose a parallel
plan because one-time filter refers to the outer node (again
correlated plan case). Basically, till now we don't support parallel
plan for any case where the correlated plan is used. So, it is
perfectly valid that it doesn't use parallel plan here.

> postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
> t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
> Seq Scan on t1 (cost=0.00..22426.28 rows=448 width=12) (actual
> time=8.335..132.557 rows=2 loops=1)
> Filter: (SubPlan 2)
> Rows Removed by Filter: 894
> SubPlan 2
> -> Result (cost=16.27..31.26 rows=999 width=4) (actual
> time=0.146..0.146 rows=0 loops=896)
> One-Time Filter: (t1.k = $1)
> InitPlan 1 (returns $1)
> -> Aggregate (cost=16.25..16.27 rows=1 width=4) (actual
> time=0.145..0.145 rows=1 loops=896)
> -> Seq Scan on t3 (cost=0.00..16.25 rows=2 width=4)
> (actual time=0.131..0.144 rows=0 loops=896)
> Filter: (i = t1.i)
> Rows Removed by Filter: 900
> -> Seq Scan on t2 (cost=16.27..31.26 rows=999 width=4) (actual
> time=0.012..0.013 rows=10 loops=2)
> Planning time: 0.272 ms
> Execution time: 132.623 ms
> (14 rows)
>
> If I change the query a little bit, the Result node doesn't appear and the
> parallel plan
> gets chosen.
>

This is a valid case for choosing a parallel plan.

> postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
> t2 where t2.k = (select max(k) from t3 where t3.i=t1.i));
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Seq Scan on t1 (cost=0.00..19162.88 rows=448 width=12) (actual
> time=3501.483..3501.483 rows=0 loops=1)
> Filter: (SubPlan 2)
> Rows Removed by Filter: 896
> SubPlan 2
> -> Gather (cost=16.27..26.47 rows=2 width=4) (actual
> time=3.471..3.795 rows=0 loops=896)
> Workers Planned: 2
> Params Evaluated: $1
> Workers Launched: 2
> InitPlan 1 (returns $1)
> -> Aggregate (cost=16.25..16.27 rows=1 width=4) (actual
> time=0.161..0.161 rows=1 loops=896)
> -> Seq Scan on t3 (cost=0.00..16.25 rows=2 width=4)
> (actual time=0.144..0.156 rows=0 loops=896)
> Filter: (i = t1.i)
> Rows Removed by Filter: 900
> -> Parallel Seq Scan on t2 (cost=0.00..10.20 rows=1 width=4)
> (actual time=0.001..0.001 rows=0 loops=804608)
> Filter: (k = $1)
> Rows Removed by Filter: 1
> Planning time: 0.480 ms
> Execution time: 3502.016 ms
> (18 rows)
>

Here if you notice the parallel node t2 refers to the initplan which
can be parallelised after this patch. Basically, whenever the
initplan is attached at or above Gather node, we compute its value and
pass down to workers.

> I didn't check the code why the plan is not getting chosen.
> Just shared it for your reference, whether it is a known already.
>

Yeah, it is known the behavior of the patch.

By the way, the patch doesn't apply on HEAD, so attached rebased patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pq_pushdown_initplan_v5.patch application/octet-stream 37.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-09 12:03:54 Re: dubious error message from partition.c
Previous Message Kuntal Ghosh 2017-08-09 10:54:10 Re: parallelize queries containing initplans