From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Aggregate |
Date: | 2016-03-17 13:11:09 |
Message-ID: | CAKJS1f_0F5U_L6MeGpkD2tS79fWRmtVhWc4hwJFPa=6P8=YCgA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18 March 2016 at 01:22, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>
>> On 17 March 2016 at 01:19, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > Few assorted comments:
>> >
>> > 2.
>> > AggPath *
>> > create_agg_path(PlannerInfo *root,
>> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>> >
>> > List *groupClause,
>> > List *qual,
>> > const AggClauseCosts
>> > *aggcosts,
>> > - double numGroups)
>> > + double numGroups,
>> > +
>> > bool combineStates,
>> > + bool finalizeAggs)
>> >
>> > Don't you need to set parallel_aware flag in this function as we do for
>> > create_seqscan_path()?
>>
>> I don't really know the answer to that... I mean there's nothing
>> special done in nodeAgg.c if the node is running in a worker or in the
>> main process.
>>
>
> On again thinking about it, I think it is okay to set parallel_aware flag as
> false. This flag means whether that particular node has any parallelism
> behaviour which is true for seqscan, but I think not for partial aggregate
> node.
>
> Few other comments on latest patch:
>
> 1.
>
> + /*
> + * XXX does this need estimated for each partial path, or are they
> + * all
> going to be the same anyway?
> + */
> + dNumPartialGroups = get_number_of_groups(root,
> +
> clamp_row_est(partial_aggregate_path->rows),
> +
> rollup_lists,
> +
> rollup_groupclauses);
>
> For considering partial groups, do we need to rollup related lists?
No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.
> 2.
> + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
> +
> &agg_costs,
> +
> dNumPartialGroups);
> +
> + /*
> + * Generate a
> hashagg Path, if we can, but we'll skip this if the hash
> + * table looks like it'll exceed work_mem.
> +
> */
> + if (can_hash && hashaggtablesize < work_mem * 1024L)
>
>
> hash table size should be estimated only if can_hash is true.
Good point. Changed.
>
> 3.
> + foreach(lc, grouped_rel->partial_pathlist)
> + {
> + Path *path =
> (Path *) lfirst(lc);
> + double total_groups;
> +
> + total_groups = path-
>>parallel_degree * path->rows;
> +
> + path = (Path *) create_gather_path(root, grouped_rel, path,
> NULL,
> + &total_groups);
>
> Do you need to perform it foreach partial path or just do it for
> firstpartial path?
That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.
There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.
Updated patch is attached. Thanks for the re-review.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch | application/octet-stream | 51.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-17 13:14:58 | Re: [HACKERS] pgbench -C -M prepared gives an error |
Previous Message | Robert Haas | 2016-03-17 13:01:36 | Re: Performance degradation in commit ac1d794 |