Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-17 05:05:42
Message-ID: CAKJS1f-dYhv+1_Sgudv5r_cP-XgVuD8EaKrOmJJrozepsn=e=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 March 2016 at 01:19, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Few assorted comments:
>
> 1.
> /*
> + * Determine if it's possible to perform aggregation in parallel using
> + * multiple worker processes. We can permit this when there's at least one
> + * partial_path in input_rel, but not if the query has grouping sets,
> + * (although this likely just requires a bit more thought). We must also
> + * ensure that any aggregate functions which are present in either the
> + * target list, or in the HAVING clause all support parallel mode.
> + */
> + can_parallel = false;
> +
> + if ((parse->hasAggs || parse->groupClause != NIL) &&
> + input_rel->partial_pathlist != NIL &&
> + parse->groupingSets == NIL &&
> + root->glob->parallelModeOK)
>
> I think here you need to use has_parallel_hazard() with second parameter as
> false to ensure expressions are parallel safe. glob->parallelModeOK flag
> indicates that there is no parallel unsafe expression, but it can still
> contain parallel restricted expression.

Yes, I'd not gotten to fixing that per Robert's original comment about
it, but I think I have now.

> 2.
> AggPath *
> create_agg_path(PlannerInfo *root,
> @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>
> List *groupClause,
> List *qual,
> const AggClauseCosts
> *aggcosts,
> - double numGroups)
> + double numGroups,
> +
> bool combineStates,
> + bool finalizeAggs)
>
> Don't you need to set parallel_aware flag in this function as we do for
> create_seqscan_path()?

I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process. So I guess the only difference is that EXPLAIN will read
"Parallel Partial (Hash|Group)Aggregate" instead of "Partial
(Hash|Group)Aggregate", is that desired? What's the logic behind
having "Parallel" in EXPLAIN?

> 3.
> postgres=# explain select count(*) from t1;
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> ------
> Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)
> -> Gather (cost=45420.35..45420.56 rows=2 width=8)
> Number of Workers: 2
> -> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)
> -> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988
> wid
> th=0)
> (5 rows)
>
> Isn't it better to call it as Parallel Aggregate instead of Partial
> Aggregate. Initialy, we have kept Partial for seqscan, but later on we
> changed to Parallel Seq Scan, so I am not able to think why it is better to
> call Partial incase of Aggregates.

I already commented on this.

> 4.
> /*
> + * Likewise for any partial paths, although this case is more simple as
> +
> * we don't track the cheapest path.
> + */
> + foreach(lc, current_rel->partial_pathlist)
> +
> {
> + Path *subpath = (Path *) lfirst(lc);
> +
> + Assert(subpath->param_info ==
> NULL);
> + lfirst(lc) = apply_projection_to_path(root, current_rel,
> +
> subpath, scanjoin_target);
> + }
> +
>
>
> Can't we do this by teaching apply_projection_to_path() as done in the
> latest patch posted by me to push down the target list beneath workers [1].

Probably, but I'm not sure I want to go changing that now. The patch
is useful without it, so perhaps it can be a follow-on fix.

> 5.
> + /*
> + * If we find any aggs with an internal transtype then we must ensure
> + * that
> pointers to aggregate states are not passed to other processes,
> + * therefore we set the maximum degree
> to PAT_INTERNAL_ONLY.
> + */
> + if (aggform->aggtranstype == INTERNALOID)
> +
> context->allowedtype = PAT_INTERNAL_ONLY;
>
> In the above comment, you have refered maximum degree which is not making
> much sense to me. If it is not a typo, then can you clarify the same.

hmm. I don't quite understand the confusion. Perhaps you think of
"degree" in the parallel sense? This is talking about the levels of
degree of partial aggregation, which is nothing to do with parallel
aggregation, parallel aggregation just requires that to work. The
different. "Degree" in this sense was just meaning that PAY_ANY is the
highest degree, PAT_INTERNAL_ONLY lesser so etc. I thought the
"degree" thing was explained ok in the comment for
aggregates_allow_partial(), but perhaps I should just remove it, if
it's confusing.

Changed to:
/*
* If we find any aggs with an internal transtype then we must ensure
* that pointers to aggregate states are not passed to other processes,
* therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
*/

> 6.
> + * fix_combine_agg_expr
> + * Like fix_upper_expr() but additionally adjusts the Aggref->args of
> + * Aggrefs so
> that they references the corresponding Aggref in the subplan.
> + */
> +static Node *
> +fix_combine_agg_expr(PlannerInfo
> *root,
> + Node *node,
> + indexed_tlist *subplan_itlist,
> +
> Index newvarno,
> + int rtoffset)
> +{
> + fix_upper_expr_context context;
> +
> + context.root =
> root;
> + context.subplan_itlist = subplan_itlist;
> + context.newvarno = newvarno;
> + context.rtoffset = rtoffset;
> +
> return fix_combine_agg_expr_mutator(node, &context);
> +}
> +
> +static Node *
> +fix_combine_agg_expr_mutator(Node *node,
> fix_upper_expr_context *context)
>
> Don't we want to handle the case of context->subplan_itlist->has_non_vars as
> it is handled in fix_upper_expr_mutator()? If not then, I think adding the
> reason for same in comments above function would be better.

Yes it should be doing the same as fix_upper_expr_mutator with the
exception of handling of Aggrefs Will fix. Thanks!

> 7.
> tlist.c
>
> +}
> \ No newline at end of file
>
> There should be a new line at end of file.

Thanks.

> [1] -
> http://www.postgresql.org/message-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com

Updated patch attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch application/octet-stream 50.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-03-17 06:25:00 Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Previous Message Pavel Stehule 2016-03-17 04:44:46 Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types