Re: Parallel Aggregate

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-18 07:25:00
Message-ID: CAA4eK1K5DTkqsNQsmeLBKQMpL6OcM10R1KbgzUy9ERffCAfEvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 17, 2016 at 6:41 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
>
> On 18 March 2016 at 01:22, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> > <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> Updated patch is attached. Thanks for the re-review.
>

Few more comments:

1.

+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);

For final path, why do you want to sort just for group by case?

2.
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ total_groups));

In above part of patch, it seems you are using number of groups
differenetly; for create_group_path() and create_gather_path(), you have
used total_groups whereas for create_agg_path()
partial_grouped_rel->rows is used, is there a reason for the same?

3.
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *)
linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups =
path->rows * path->parallel_degree;
+ path = (Path *) create_gather_path(root, partial_grouped_rel,
path,
+ NULL, &total_groups);

A. Won't passing partial_grouped_rel lead to incomplete information
required by create_gather_path() w.r.t the case of parameterized path info?
B. You have mentioned that passing grouped_rel will make gather path
contain the information of final path target, but what is the problem with
that? I mean to ask why Gather node is required to contain partial path
target information instead of final path target.
C. Can we consider passing pathtarget to create_gather_path() as that seems
to save us from inventing new UpperRelationKind? If you are worried about
adding the new parameter (pathtarget) to create_gather_path(), then I think
we are already passing it in many other path generation functions, so why
not for gather path generation as well?

4A.
Overall function create_grouping_paths() looks better than previous, but I
think still it is difficult to read. I think it can be improved by
generating partial aggregate paths separately as we do for nestloop
join refer function consider_parallel_nestloop

4B.
Rather than directly using create_gather_path(), can't we use
generate_gather_paths as for all places where we generate gather node,
generate_gather_paths() is used.

5.
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
{
..
..
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+
i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target-
>sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+
!= NULL)
+ {
+ /*
+
* It's a grouping column, so add it to the input target as-is.
+ */
+
add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+
}
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+
* call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+
}
..
}

Do we want to achieve something different in the above foreach loop than
the similar loop in make_group_input_target(), if not then why are they not
exactly same?

6.
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root,
input_target);
+ return input_target;

Can't we use return set_pathtarget_cost_width() directly rather than
fetching it in input_target and then returning input_target?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2016-03-18 07:48:22 Re: pgsql: Improve memory management for external sorts.
Previous Message Pavel Stehule 2016-03-18 07:22:18 Re: IF (NOT) EXISTS in psql-completion