Re: parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-03-14 09:50:04
Message-ID: CAA4eK1KpwhxYUe1iRi5Q-jWD_1kOpgSaP=zj35OnoWH2MoHVoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Feb 10, 2017 at 4:34 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I could see two possibilities to determine whether the plan (for which
> we are going to generate an initplan) contains a reference to a
> correlated var param node. One is to write a plan or path walker to
> determine any such reference and the second is to keep the information
> about the correlated param in path node. I think the drawback of the
> first approach is that traversing path tree during generation of
> initplan can be costly, so for now I have kept the information in path
> node to prohibit generating parallel initplans which contain a
> reference to correlated vars. I think we can go with first approach of
> using path walker if people feel that is better than maintaining a
> reference in path. Attached patch
> prohibit_parallel_correl_params_v1.patch implements the second
> approach of keeping the correlated var param reference in path node
> and pq_pushdown_initplan_v2.patch uses that to generate parallel
> initplans.
>

Two weeks back when Robert was in Bangalore, we (myself, Kuntal and
Robert) had a discussion on this patch. He mentioned that the idea
of pulling up initplans (uncorrelated initplans) at Gather node (and
then execute them and share the values to each worker) used in this
patch doesn't sound appealing and has a chance of bugs in some corner
cases. We discussed an idea where the first worker to access the
initplan will evaluate it and then share the value with other
participating processes, but with that, we won't be able to use
parallelism in the execution of Initplan due to the restriction of
multiple levels of Gather node. Another idea we discussed is that we
can evaluate the Initplans at Gather node if it is used as an external
param (plan->extParam) at or below the Gather node.

Based on that idea, I have modified the patch such that it will
compute the set of initplans Params that are required below gather
node and store them as bitmap of initplan params at gather node.
During set_plan_references, we can find the intersection of external
parameters that are required at Gather or nodes below it with the
initplans that are passed from same or above query level. Once the set
of initplan params are established, we evaluate those (if they are not
already evaluated) before execution of gather node and then pass the
computed value to each of the workers. To identify whether a
particular param is parallel safe or not, we check if the paramid of
the param exists in initplans at same or above query level. We don't
allow to generate gather path if there are initplans at some query
level below the current query level as those plans could be
parallel-unsafe or undirect correlated plans.

This restricts some of the cases for parallelism like when initplans
are below gather node, but the patch looks better. We can open up
those cases if required in a separate patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pq_pushdown_initplan_v3.patch application/octet-stream 33.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-03-14 10:15:51 Re: scram and \password
Previous Message Rushabh Lathia 2017-03-14 09:47:20 Re: Gather Merge