Re: [HACKERS] why not parallel seq scan for slow functions

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Date: 2017-12-29 14:26:15
Message-ID: afd43869dcb18830118e9bcd813cbb4c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone in this thread!

On 29-11-2017 8:01, Michael Paquier wrote:
> Moved to next CF for extra reviews.

Amit, I would like to ask some questions about your patch (and can you
please rebase it on the top of the master?):

1)
+ path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) *
path->rows;

Here we undo the changes that we make in this function earlier:

path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Perhaps we should not start these "reversible" changes in this case from
the very beginning?

2)
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
gpath->subpath,
target);
...
+ if (((ProjectionPath *) gpath->subpath)->dummypp)
+ {
...
+ path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) *
gpath->subpath->rows;
+ }
+ else
+ {
...
+ path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) *
gpath->subpath->rows;
+ }

As I understand it, here in the if-else block we change the run costs of
gpath in the same way as they were changed for its subpath in the
function create_projection_path earlier. But for the startup costs we
always subtract the cost of the old target:

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Should we change the startup costs of gpath in this way if
((ProjectionPath *) gpath->subpath)->dummypp is false?

3)
simple_gather_path = (Path *)
create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
NULL, NULL);
+ /* Add projection step if needed */
+ if (target && simple_gather_path->pathtarget != target)
+ simple_gather_path = apply_projection_to_path(root, rel,
simple_gather_path, target);
...
+ path = (Path *) create_gather_merge_path(root, rel, subpath,
rel->reltarget,
+ subpath->pathkeys, NULL, NULL);
+ /* Add projection step if needed */
+ if (target && path->pathtarget != target)
+ path = apply_projection_to_path(root, rel, path, target);
...
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
...
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
gpath->subpath,
target);

The target is changing so we change it for the gather(merge) node and
for its subpath. Do not we have to do this work (replace the subpath by
calling the function create_projection_path if the target is different)
in the functions create_gather(_merge)_path too? I suppose that the
target of the subpath affects its costs => the costs of the
gather(merge) node in the functions cost_gather(_merge) (=> the costs of
the gather(merge) node in the function apply_projection_to_path).

4)
+ * Consider ways to implement parallel paths. We always skip
+ * generating parallel path for top level scan/join nodes till the
+ * pathtarget is computed. This is to ensure that we can account for
+ * the fact that most of the target evaluation work will be performed
+ * in workers.
+ */
+ generate_gather_paths(root, current_rel, scanjoin_target);
+
+ /* Set or update cheapest_total_path and related fields */
+ set_cheapest(current_rel);

As I understand it (if correctly, thank the comments of Robert Haas and
Tom Lane :), after that we cannot use the function
apply_projection_to_path for paths in the current_rel->pathlist without
risking that the cheapest path will change. And we have several calls to
the function adjust_paths_for_srfs (which uses apply_projection_to_path
for paths in the current_rel->pathlist) in grouping_planner after
generating the gather paths:

/* Now fix things up if scan/join target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
scanjoin_targets,
scanjoin_targets_contain_srfs);
...
/* Fix things up if grouping_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
grouping_targets,
grouping_targets_contain_srfs);
...
/* Fix things up if sort_input_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
sort_input_targets,
sort_input_targets_contain_srfs);
...
/* Fix things up if final_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
final_targets,
final_targets_contain_srfs);

Maybe we should add the appropriate call to the function set_cheapest()
after this if parse->hasTargetSRFs is true?

5)
+ * If this is a baserel and not the top-level rel, consider gathering
any
+ * partial paths we may have created for it. (If we tried to gather
+ * inheritance children, we could end up with a very large number of
+ * gather nodes, each trying to grab its own pool of workers, so don't
do
+ * this for otherrels. Instead, we'll consider gathering partial
paths
+ * for the parent appendrel.). We can check for joins by counting the
+ * membership of all_baserels (note that this correctly counts
inheritance
+ * trees as single rels). The gather path for top-level rel is
generated
+ * once path target is available. See grouping_planner.
*/
- if (rel->reloptkind == RELOPT_BASEREL)
- generate_gather_paths(root, rel);
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ bms_membership(root->all_baserels) != BMS_SINGLETON)
+ generate_gather_paths(root, rel, NULL);

Do I understand correctly that here there's an assumption about
root->query_level (which we don't need to check)?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-12-29 14:39:39 Re: [HACKERS] [PATCH] Lockable views
Previous Message Craig Ringer 2017-12-29 14:18:53 Re: [PATCH] session_replication_role = replica with TRUNCATE