Re: Parallel Aggregate

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-16 14:47:00
Message-ID: CA+TgmobF6bLxXJj-g8A1naYGGuSm-VgLUr=8Dh2FG58KLGwcgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 16, 2016 at 7:57 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> On 16 March 2016 at 15:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I don't think I'd be objecting if you made PartialAggref a real
>>> alternative to Aggref. But that's not what you've got here. A
>>> PartialAggref is just a wrapper around an underlying Aggref that
>>> changes the interpretation of it - and I think that's not a good idea.
>>> If you want to have Aggref and PartialAggref as truly parallel node
>>> types, that seems cool, and possibly better than what you've got here
>>> now. Alternatively, Aggref can do everything. But I don't think we
>>> should go with this wrapper concept.
>>
>> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
>> quite happy with how it turned out. I made
>> search_indexed_tlist_for_partial_aggref() to follow-on the series of
>> other search_indexed_tlist_for_* functions and have made it behave the
>> same way, by returning the newly created Var instead of doing that in
>> fix_combine_agg_expr_mutator(), as the last version did.
>>
>> Thanks for the suggestion.
>>
>> New patch attached.
>
> Cool! Why not initialize aggpartialtype always?

More review comments:

/*
+ * Likewise for any partial paths, although this case
is more simple as
+ * we don't track the cheapest path.
+ */

I think in the process of getting rebased over the rapidly-evolving
underlying substructure, this comment no longer makes much sense where
it is in the file. IIUC, the comment is referring back to "Forcibly
apply that target to all the Paths for the scan/join rel", but there's
now enough other stuff in the middle that it doesn't really make sense
any more. And actually, I think you should move the code up higher,
not change the comment. This belongs before setting
root->upper_targets[foo].

The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
have both complained about, wrong in detail because it doesn't call
has_parallel_hazard anywhere. Basically, you have the wrong design.
There shouldn't be any need to check parallelModeOK here. Rather,
what you should be doing is setting consider_parallel to true or false
on the upper rel. See set_rel_consider_parallel for how this is set
for base relations, set_append_rel_size() for append relations, and
perhaps most illustratively build_join_rel() for join relations. You
should have some equivalent of this logic for upper rels, or at least
the upper rels you care about:

if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
!has_parallel_hazard((Node *) restrictlist, false))
joinrel->consider_parallel = true;

Then, you can consider parallel aggregation if consider_parallel is
true and any other conditions that you care about are also met.

I think that the way you are considering sorted aggregation, hashed
aggregation, and mixed strategies does not make very much sense. It
seems to me that what you should do is:

1. Figure out the cheapest PartialAggregate path. You will need to
compare the costs of (a) a hash aggregate, (b) an explicit sort +
group aggregate, and (c) grouping a presorted path. (We can
technically skip (c) for right now since it can't occur.) I would go
ahead and use add_partial_path() on these to stuff them into the
partial_pathlist for the upper rel.

2. Take the first (cheapest) path in the partial_pathlist and stick a
Gather node on top of it. Store this in a local variable, say,
partial_aggregate_path.

3. Construct a finalize-hash-aggregate path for partial_aggregate_path
and also a sort+finalize-group/plain-aggregate path for
partial_aggregate_path, and add each of those to the upper rel. They
will either beat out the non-parallel paths or they won't.

The point is that the decision as to whether to use hashing or sorting
below the Gather is completely independent from the choice of which
one to use above the Gather. Pick the best strategy below the Gather;
then pick the best strategy to stick on top of that above the Gather.

* This is very similar to make_group_input_target(), only we do not recurse
* into Aggrefs. Aggrefs are left intact and added to the target list. Here we
* also add any Aggrefs which are located in the HAVING clause into the
* PathTarget.
*
* Aggrefs are also setup into partial mode and the partial return types are
* set to become the type of the aggregate transition state rather than the
* aggregate function's return type.

This comment isn't very helpful because it tells you what the function
does, which you can find out anyway from reading the code. What it
should do is explain why it does it. Just to take one particular
point, why would we not want to recurse into Aggrefs in this case?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-16 14:49:52 Re: Declarative partitioning
Previous Message Teodor Sigaev 2016-03-16 14:31:32 Re: WIP: Access method extendability