Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-17 13:11:09
Message-ID: CAKJS1f_0F5U_L6MeGpkD2tS79fWRmtVhWc4hwJFPa=6P8=YCgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 March 2016 at 01:22, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>
>> On 17 March 2016 at 01:19, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > Few assorted comments:
>> >
>> > 2.
>> > AggPath *
>> > create_agg_path(PlannerInfo *root,
>> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>> >
>> > List *groupClause,
>> > List *qual,
>> > const AggClauseCosts
>> > *aggcosts,
>> > - double numGroups)
>> > + double numGroups,
>> > +
>> > bool combineStates,
>> > + bool finalizeAggs)
>> >
>> > Don't you need to set parallel_aware flag in this function as we do for
>> > create_seqscan_path()?
>>
>> I don't really know the answer to that... I mean there's nothing
>> special done in nodeAgg.c if the node is running in a worker or in the
>> main process.
>>
>
> On again thinking about it, I think it is okay to set parallel_aware flag as
> false. This flag means whether that particular node has any parallelism
> behaviour which is true for seqscan, but I think not for partial aggregate
> node.
>
> Few other comments on latest patch:
>
> 1.
>
> + /*
> + * XXX does this need estimated for each partial path, or are they
> + * all
> going to be the same anyway?
> + */
> + dNumPartialGroups = get_number_of_groups(root,
> +
> clamp_row_est(partial_aggregate_path->rows),
> +
> rollup_lists,
> +
> rollup_groupclauses);
>
> For considering partial groups, do we need to rollup related lists?

No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.

> 2.
> + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
> +
> &agg_costs,
> +
> dNumPartialGroups);
> +
> + /*
> + * Generate a
> hashagg Path, if we can, but we'll skip this if the hash
> + * table looks like it'll exceed work_mem.
> +
> */
> + if (can_hash && hashaggtablesize < work_mem * 1024L)
>
>
> hash table size should be estimated only if can_hash is true.

Good point. Changed.

>
> 3.
> + foreach(lc, grouped_rel->partial_pathlist)
> + {
> + Path *path =
> (Path *) lfirst(lc);
> + double total_groups;
> +
> + total_groups = path-
>>parallel_degree * path->rows;
> +
> + path = (Path *) create_gather_path(root, grouped_rel, path,
> NULL,
> + &total_groups);
>
> Do you need to perform it foreach partial path or just do it for
> firstpartial path?

That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.

There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.

Updated patch is attached. Thanks for the re-review.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch application/octet-stream 51.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-17 13:14:58 Re: [HACKERS] pgbench -C -M prepared gives an error
Previous Message Robert Haas 2016-03-17 13:01:36 Re: Performance degradation in commit ac1d794