Re: [HACKERS] why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Date: 2018-01-02 11:38:55
Message-ID: CAA4eK1LEuQihTqa3cBW9pZ0xfZYwg5YJtj+nPkLFN-7PJ59=qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 29, 2017 at 7:56 PM, Marina Polyakova
<m(dot)polyakova(at)postgrespro(dot)ru> wrote:
> Hello everyone in this thread!
>
> On 29-11-2017 8:01, Michael Paquier wrote:
>>
>> Moved to next CF for extra reviews.
>
>
> Amit, I would like to ask some questions about your patch (and can you
> please rebase it on the top of the master?):
>

Thanks for looking into the patch.

> 1)
> + path->total_cost -= (target->cost.per_tuple -
> oldcost.per_tuple) * path->rows;
>
> Here we undo the changes that we make in this function earlier:
>
> path->total_cost += target->cost.startup - oldcost.startup +
> (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
>
> Perhaps we should not start these "reversible" changes in this case from the
> very beginning?
>

We can do that way as well, however, when the patch was written we
don't have GatherMerge handling in that function due to which it
appears to me that changing the code the way you are suggesting will
complicate the code, but now it looks saner to do it that way. I have
changed the code accordingly.

> 2)
> gpath->subpath = (Path *)
> create_projection_path(root,
>
> gpath->subpath->parent,
>
> gpath->subpath,
> target);
> ...
> + if (((ProjectionPath *) gpath->subpath)->dummypp)
> + {
> ...
> + path->total_cost += (target->cost.per_tuple -
> oldcost.per_tuple) * gpath->subpath->rows;
> + }
> + else
> + {
> ...
> + path->total_cost += (cpu_tuple_cost +
> target->cost.per_tuple) * gpath->subpath->rows;
> + }
>
> As I understand it, here in the if-else block we change the run costs of
> gpath in the same way as they were changed for its subpath in the function
> create_projection_path earlier. But for the startup costs we always subtract
> the cost of the old target:
>
> path->startup_cost += target->cost.startup - oldcost.startup;
> path->total_cost += target->cost.startup - oldcost.startup +
> (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
>
> Should we change the startup costs of gpath in this way if ((ProjectionPath
> *) gpath->subpath)->dummypp is false?
>

The startup costs will be computed in the way you mentioned for both
gpath and gmpath as that code is executed before we do any handling
for Gather or GatherMerge path.

> 3)
> simple_gather_path = (Path *)
> create_gather_path(root, rel, cheapest_partial_path,
> rel->reltarget,
> NULL, NULL);
> + /* Add projection step if needed */
> + if (target && simple_gather_path->pathtarget != target)
> + simple_gather_path = apply_projection_to_path(root, rel,
> simple_gather_path, target);
> ...
> + path = (Path *) create_gather_merge_path(root, rel, subpath,
> rel->reltarget,
> +
> subpath->pathkeys, NULL, NULL);
> + /* Add projection step if needed */
> + if (target && path->pathtarget != target)
> + path = apply_projection_to_path(root, rel, path,
> target);
> ...
> @@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
> ...
> gpath->subpath = (Path *)
> create_projection_path(root,
>
> gpath->subpath->parent,
>
> gpath->subpath,
> target);
>
> The target is changing so we change it for the gather(merge) node and for
> its subpath. Do not we have to do this work (replace the subpath by calling
> the function create_projection_path if the target is different) in the
> functions create_gather(_merge)_path too? I suppose that the target of the
> subpath affects its costs => the costs of the gather(merge) node in the
> functions cost_gather(_merge) (=> the costs of the gather(merge) node in the
> function apply_projection_to_path).
>

The cost impact due to different target will be taken care when we
call apply_projection_to_path. I am not sure if I understand your
question completely, but do you see anything in functions
cost_gather(_merge) which is suspicious and the target list costing
can impact the same. Generally, we compute the target list cost in
the later stage of planning once the path target is formed.

> 4)
> + * Consider ways to implement parallel paths. We always
> skip
> + * generating parallel path for top level scan/join nodes
> till the
> + * pathtarget is computed. This is to ensure that we can
> account for
> + * the fact that most of the target evaluation work will be
> performed
> + * in workers.
> + */
> + generate_gather_paths(root, current_rel, scanjoin_target);
> +
> + /* Set or update cheapest_total_path and related fields */
> + set_cheapest(current_rel);
>
> As I understand it (if correctly, thank the comments of Robert Haas and Tom
> Lane :), after that we cannot use the function apply_projection_to_path for
> paths in the current_rel->pathlist without risking that the cheapest path
> will change. And we have several calls to the function adjust_paths_for_srfs
> (which uses apply_projection_to_path for paths in the current_rel->pathlist)
> in grouping_planner after generating the gather paths:
>

I think in general that is true even without this patch and without
having parallel paths. The main thing we are trying to cover with
this patch is that we try to generate parallel paths for top-level rel
after path target is computed so that the costing can take into
account the fact that bulk of target list evaluation can be done in
workers. I think adjust_paths_for_srfs can impact costing for
parallel paths if the ProjectSet nodes are allowed to be pushed to
workers, but I don't see that happening (See
create_set_projection_path.). If by any chance, you have an example
test to show the problem due to the point you have mentioned, then it
can be easier to see whether it can impact the selection of parallel
paths?

> 5)
> + * If this is a baserel and not the top-level rel, consider
> gathering any
> + * partial paths we may have created for it. (If we tried to gather
> + * inheritance children, we could end up with a very large number of
> + * gather nodes, each trying to grab its own pool of workers, so
> don't do
> + * this for otherrels. Instead, we'll consider gathering partial
> paths
> + * for the parent appendrel.). We can check for joins by counting
> the
> + * membership of all_baserels (note that this correctly counts
> inheritance
> + * trees as single rels). The gather path for top-level rel is
> generated
> + * once path target is available. See grouping_planner.
> */
> - if (rel->reloptkind == RELOPT_BASEREL)
> - generate_gather_paths(root, rel);
> + if (rel->reloptkind == RELOPT_BASEREL &&
> + bms_membership(root->all_baserels) != BMS_SINGLETON)
> + generate_gather_paths(root, rel, NULL);
>
> Do I understand correctly that here there's an assumption about
> root->query_level (which we don't need to check)?
>

I don't think we need to check query_level, but can you please be more
explicit about the point you have in mind? Remember that we never
generate gather path atop any other gather path, if that is the
assumption you have in mind then we don't need any extra check for the
same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_paths_include_tlist_cost_v7.patch application/octet-stream 12.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-02 12:08:46 Re: Add default role 'pg_access_server_files'
Previous Message Amit Khandekar 2018-01-02 11:26:31 Re: [HACKERS] UPDATE of partition key