Re: parallelize queries containing initplans

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-08-21 08:14:11
Message-ID: CAJrrPGew8079-m5A4+S278n32vNovCrMcBWfd1YGS5LML6sesQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 14, 2017 at 8:41 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Sun, Aug 13, 2017 at 6:49 PM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> > On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> >>
> >
> > Thanks for the updated patch. Patch looks fine. I just have some
> > minor comments.
> >
> > + * ExecEvalParamExecParams
> > + *
> > + * Execute the subplan stored in PARAM_EXEC initplans params, if not
> > executed
> > + * till now.
> > + */
> > +void
> > +ExecEvalParamExecParams(Bitmapset *params, EState *estate)
> >
> > I feel it is better to explain when this function executes the sub plans
> > that are
> > not executed till now? Means like in what scenario?
> >
>
> It just means that it will execute the same initplan (subplan) just
> once in master backend even if it used in multiple places. This is
> the same kind of usage as we have in ExecEvalParamExec. You can find
> its usage by using some query like
>
> explain analyse select sum(t1.i) from t1, t2 where t1.j=t2.j and t1.k
> = (select count(k) from t3) and t1.k=t2.k;
>
> Ensure you insert some rows in t1 and t2 that match the count from t3.
> If you are using the schema and data given in Kuntal's script in email
> above, then you need to insert something like
> t1(900,900,900);t2(900,900,900);
>
> It generates plan like below:
>
> postgres=# explain analyse select sum(t1.i) from t1, t2 where
> t1.j=t2.j and t1.k = (select count(k) from t3) and t1.k=t2.k;
> QUERY PLAN
> ------------------------------------------------------------
> -----------------------------------------------------------------------
> Aggregate (cost=29.65..29.66 rows=1 width=8) (actual
> time=22572.521..22572.521 rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Finalize Aggregate (cost=9.70..9.71 rows=1 width=8) (actual
> time=4345.110..4345.111 rows=1 loops=1)
> -> Gather (cost=9.69..9.70 rows=2 width=8) (actual
> time=4285.019..4345.098 rows=3 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=9.69..9.70 rows=1
> width=8) (actual time=0.154..0.155 rows=1 loops=3)
> -> Parallel Seq Scan on t3 (cost=0.00..8.75
> rows=375 width=4) (actual time=0.011..0.090 rows=300 loops=3)
> -> Nested Loop (cost=0.00..19.93 rows=1 width=4) (actual
> time=22499.918..22572.512 rows=1 loops=1)
> Join Filter: (t1.j = t2.j)
> -> Gather (cost=0.00..9.67 rows=2 width=12) (actual
> time=10521.356..10521.363 rows=1 loops=1)
> Workers Planned: 2
> Params Evaluated: $0
> Workers Launched: 2
> -> Parallel Seq Scan on t1 (cost=0.00..9.67 rows=1
> width=12) (actual time=0.506..0.507 rows=0 loops=3)
> Filter: (k = $0)
> Rows Removed by Filter: 299
> -> Materialize (cost=0.00..10.21 rows=2 width=8) (actual
> time=11978.557..12051.142 rows=1 loops=1)
> -> Gather (cost=0.00..10.20 rows=2 width=8) (actual
> time=11978.530..12051.113 rows=1 loops=1)
> Workers Planned: 2
> Params Evaluated: $0
> Workers Launched: 2
> -> Parallel Seq Scan on t2 (cost=0.00..10.20
> rows=1 width=8) (actual time=0.067..0.067 rows=0 loops=3)
> Filter: (k = $0)
> Rows Removed by Filter: 333
> Planning time: 15103.237 ms
> Execution time: 22574.703 ms
> (27 rows)
>
> You can notice that initplan is used at multiple nodes, but it will be
> evaluated just once. If you want, I can add a sentence in the
> comments, but I think this is somewhat obvious and the same use case
> already exists. Let me know if you still think that comments need to
> be expanded?
>

Thanks for providing details. Yes it is clear to me.

>
> > + if (IsA(plan, Gather))
> > + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> > initSetParam);
> > + else
> > + ((GatherMerge *) plan)->initParam =
> > bms_intersect(plan->lefttree->extParam, initSetParam);
> >
> >
> > I think the above code is to find out the common parameters that are
> prsent
> > in the external
> > and out params.
> >
>
> Here, we want to save all the initplan params that can be used below
> the gather node. extParam contains the set of all external PARAM_EXEC
> params that can be used below gather node and we just want initplan
> params out of those.
>
> > It may be better to explain the logic in the comments.
> >
>
> I have kept comments atop of the function set_param_references to
> explain the context, but now I have added few more in the code as
> suggested by you. See if that suffices the need.

Thanks for adding more details. It is easy to understand.

I marked the patch as ready for committer in the commitfest.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-21 09:08:04 Re: why not parallel seq scan for slow functions
Previous Message Michael Meskes 2017-08-21 07:59:48 Re: assorted code cleanup