Re: Parallel Aggregate

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-16 12:19:16
Message-ID: CAA4eK1+zm7iog8RHSrFC-AZ0qxKPJiTekJAOSwHmW5AG9_nTRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref. But that's not what you've got here. A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now. Alternatively, Aggref can do everything. But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>

Few assorted comments:

1.
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)

I think here you need to use has_parallel_hazard() with second parameter as
false to ensure expressions are parallel safe. glob->parallelModeOK flag
indicates that there is no parallel unsafe expression, but it can still
contain parallel restricted expression.

2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,

List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)

Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?

3.
postgres=# explain select count(*) from t1;
QUERY PLAN

--------------------------------------------------------------------------------
------
Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)
-> Gather (cost=45420.35..45420.56 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)
-> Parallel Seq Scan on t1 (cost=0.00..44107.88
rows=124988 wid
th=0)
(5 rows)

Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.

4.
/*
+ * Likewise for any partial paths, although this case is more simple as
+
* we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+
{
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info ==
NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+
subpath, scanjoin_target);
+ }
+

Can't we do this by teaching apply_projection_to_path() as done in the
latest patch posted by me to push down the target list beneath workers [1].

5.
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that
pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree
to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+
context->allowedtype = PAT_INTERNAL_ONLY;

In the above comment, you have refered maximum degree which is not making
much sense to me. If it is not a typo, then can you clarify the same.

6.
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so
that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo
*root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+
Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root =
root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+
return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node,
fix_upper_expr_context *context)

Don't we want to handle the case of context->subplan_itlist->has_non_vars
as it is handled in fix_upper_expr_mutator()? If not then, I think adding
the reason for same in comments above function would be better.

7.
tlist.c

+}
\ No newline at end of file

There should be a new line at end of file.

[1] -
http://www.postgresql.org/message-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-16 12:29:31 Re: Parallel Aggregate
Previous Message Valery Popov 2016-03-16 12:14:23 Re: Password identifiers, protocol aging and SCRAM protocol