Re: parallelize queries containing initplans

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-08-09 04:54:40
Message-ID: CAJrrPGdbmzszySr55OFqDWnvig6UofKjZB0dBcJtZNY6JkdvtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 17, 2017 at 10:53 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Tue, Mar 28, 2017 at 7:25 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
> > <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> >> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >>> Based on that idea, I have modified the patch such that it will
> >>> compute the set of initplans Params that are required below gather
> >>> node and store them as bitmap of initplan params at gather node.
> >>> During set_plan_references, we can find the intersection of external
> >>> parameters that are required at Gather or nodes below it with the
> >>> initplans that are passed from same or above query level. Once the set
> >>> of initplan params are established, we evaluate those (if they are not
> >>> already evaluated) before execution of gather node and then pass the
> >>> computed value to each of the workers. To identify whether a
> >>> particular param is parallel safe or not, we check if the paramid of
> >>> the param exists in initplans at same or above query level. We don't
> >>> allow to generate gather path if there are initplans at some query
> >>> level below the current query level as those plans could be
> >>> parallel-unsafe or undirect correlated plans.
> >>
> >> I would like to mention different test scenarios with InitPlans that
> >> we've considered while developing and testing of the patch.
> >>
> >
> > Thanks a lot Kuntal for sharing different test scenarios.
> > Unfortunately, this patch doesn't received any review till now, so
> > there is no chance of making it in to PostgreSQL-10. I have moved
> > this to next CF.
> >
>
> Attached is a rebased version of the patch with below changes:
> a. SubplanState now directly stores Subplan rather than ExprState, so
> patch needs some adjustment in that regard.
> b. While rejecting the paths (based on if there are initplans at level
> below the current query level) for parallelism, the rejected paths
> were not marked as parallel unsafe. Due to this in
> force_parallel_mode=regress, we were able to add gather node above
> parallel unsafe paths. The modified patch ensures to mark such paths
> as parallel unsafe.
> c. Added regression test.
> d. Improve comments in the code.
>
>
I tested the latest patch and the parallel plan is getting choose for most
of
the init plans.

For the following query the parallel plan is not chosen. The query contains
an init plan that refer the outer node.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..22426.28 rows=448 width=12) (actual
time=8.335..132.557 rows=2 loops=1)
Filter: (SubPlan 2)
Rows Removed by Filter: 894
SubPlan 2
-> Result (cost=16.27..31.26 rows=999 width=4) (actual
time=0.146..0.146 rows=0 loops=896)
One-Time Filter: (t1.k = $1)
InitPlan 1 (returns $1)
-> Aggregate (cost=16.25..16.27 rows=1 width=4) (actual
time=0.145..0.145 rows=1 loops=896)
-> Seq Scan on t3 (cost=0.00..16.25 rows=2 width=4)
(actual time=0.131..0.144 rows=0 loops=896)
Filter: (i = t1.i)
Rows Removed by Filter: 900
-> Seq Scan on t2 (cost=16.27..31.26 rows=999 width=4) (actual
time=0.012..0.013 rows=10 loops=2)
Planning time: 0.272 ms
Execution time: 132.623 ms
(14 rows)

If I change the query a little bit, the Result node doesn't appear and the
parallel plan
gets chosen.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t2.k = (select max(k) from t3 where t3.i=t1.i));
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..19162.88 rows=448 width=12) (actual
time=3501.483..3501.483 rows=0 loops=1)
Filter: (SubPlan 2)
Rows Removed by Filter: 896
SubPlan 2
-> Gather (cost=16.27..26.47 rows=2 width=4) (actual
time=3.471..3.795 rows=0 loops=896)
Workers Planned: 2
Params Evaluated: $1
Workers Launched: 2
InitPlan 1 (returns $1)
-> Aggregate (cost=16.25..16.27 rows=1 width=4) (actual
time=0.161..0.161 rows=1 loops=896)
-> Seq Scan on t3 (cost=0.00..16.25 rows=2 width=4)
(actual time=0.144..0.156 rows=0 loops=896)
Filter: (i = t1.i)
Rows Removed by Filter: 900
-> Parallel Seq Scan on t2 (cost=0.00..10.20 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=804608)
Filter: (k = $1)
Rows Removed by Filter: 1
Planning time: 0.480 ms
Execution time: 3502.016 ms
(18 rows)

I didn't check the code why the plan is not getting chosen.
Just shared it for your reference, whether it is a known already.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-08-09 05:10:07 Re: Error : undefined symbol : LWLockAssign in 9.6.3
Previous Message Andres Freund 2017-08-09 04:33:21 Re: Error : undefined symbol : LWLockAssign in 9.6.3