Re: parallelize queries containing initplans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-10-02 14:43:04
Message-ID: CA+TgmoaQiQCgvqFmDOXjnnkPDCyLkLpPirWvLXGskFn5Xms03g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 14, 2017 at 10:45 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> The latest patch again needs to be rebased. Find rebased patch
> attached with this email.

I read through this patch this morning. Copying Tom in the hopes
that he might chime in on the following two issues in particular:

1. Is there any superior alternative to adding ptype to ParamExecData?
I think the reason why we don't have this already is because the plan
tree that populates the Param must output the right type and the plan
tree that reads the Param must expect the right type, and there's no
need for anything else. But for serialization and deserialization
this seems to be necessary. I wonder whether it would be better to
try to capture this at the time the Param is generated (e.g.
var->vartype in assign_param_for_var) rather than derived at execution
time by applying exprType(), but I'm not sure how that would work
exactly, or whether it's really any better.

2. Do max_parallel_hazard_walker and set_param_references() have the
right idea about which parameters are acceptable candidates for this
optimization? The idea seems to be to collect the setParam sets for
all initplans between the current query level and the root. That
looks correct to me but I'm not an expert on this parameter stuff.

Some other comments:

- I think parallel_hazard_walker should likely work by setting
safe_param_ids to the right set of parameter IDs rather than testing
whether the parameter is safe by checking either whether it is in
safe_param_ids or some other condition is met.

- contains_parallel_unsafe_param() sounds like a function that merely
returns true or false, but it actually has major side effects. Those
side effects also look unsafe; mutating previously-generated paths can
corrupt the rel's pathlist, because it will no longer have the sort
order and other characteristics that add_path() creates and upon which
other code relies.

- Can't is_initplan_is_below_current_query_level() be confused when
there are multiple subqueries in the tree? For example if the
toplevel query has subqueries a and b and a has a sub-subquery aa
which has an initplan, won't this function think that b has an
initplan below the current query level? If not, maybe a comment is in
order explaining why not?

- Why do we even need contains_parallel_unsafe_param() and
is_initplan_is_below_current_query_level() in the first place, anyway?
I think there's been some discussion of that on this thread, but I'm
not sure I completely understand it, and the comments in the patch
don't help me understand why we now need this restriction.

- The new code in ExplainPrintPlan() needs a comment.

- I have typically referred in comments to "Gather or Gather Merge"
rather than "gather or gather merge".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-02 14:44:32 Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Previous Message Michael Paquier 2017-10-02 14:19:54 Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()