| From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> | 
|---|---|
| To: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> | 
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Combining Aggregates | 
| Date: | 2015-03-06 06:01:14 | 
| Message-ID: | CAFjFpReipCro__5fgi4w2O0Lg0vOM_AnjD1an5a+=tQrQ7HwLw@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Thu, Mar 5, 2015 at 9:30 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > On Wed, Mar 4, 2015 at 4:41 AM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> > >> This thread mentions "parallel queries" as a use case, but that means
> > >> passing data between processes, and that requires being able to
> > >> serialize and deserialize the aggregate state somehow. For actual data
> > >> types that's not overly difficult I guess (we can use in/out
> functions),
> > >> but what about aggretates using 'internal' state? I.e. aggregates
> > >> passing pointers that we can't serialize?
> > >
> > > This is a good question. I really don't know the answer to it as I've
> not
> > > looked at the Robert's API for passing data between backends yet.
> > >
> > > Maybe Robert or Amit can answer this?
> >
> > I think parallel aggregation will probably require both the
> > infrastructure discussed here, namely the ability to combine two
> > transition states into a single transition state, and also the ability
> > to serialize and de-serialize transition states, which has previously
> > been discussed in the context of letting hash aggregates spill to
> > disk.  My current thinking is that the parallel plan will look
> > something like this:
> >
> > HashAggregateFinish
> > -> Funnel
> >     -> HashAggregatePartial
> >       -> PartialHeapScan
> >
> > So the workers will all pull from a partial heap scan and each worker
> > will aggregate its own portion of the data.  Then it'll need to pass
> > the results of that step back to the master, which will aggregate the
> > partial results and produce the final results.  I'm guessing that if
> > we're grouping on, say, column a, the HashAggregatePartial nodes will
> > kick out 2-column tuples of the form (<value for a>, <serialized
> > transition state for group with that value for a>).
> >
> It may not be an urgent topic to be discussed towards v9.5, however,
> I'd like to raise a topic about planner and aggregates.
>
> Once we could get the two phase aggregation, planner needs to pay
> attention possibility of partial aggregate during plan construction,
> even though our current implementation attach Agg node after the
> join/scan plan construction.
>
> Probably, a straightforward design is to add FunnelPath with some
> child nodes on set_rel_pathlist() or add_paths_to_joinrel().
> Its child node may be PartialAggregate node (or some other parallel
> safe node of course). In this case, it must inform the planner this
> node (for more correctness, targetlist of the node) returns partial
> aggregation, instead of the values row-by-row.
> Then, planner need to track and care about which type of data shall
> be returned to the upper node. Path node may have a flag to show it,
> and we also may need to inject dummy PartialAggregate if we try to
> join a relation that returns values row-by-row and another one that
> returns partial aggregate.
> It also leads another problem. The RelOptInfo->targetlist will
> depend on the Path node chosen. Even if float datum is expected as
> an argument of AVG(), its state to be combined is float[3].
> So, we will need to adjust the targetlist of RelOptInfo, once Path
> got chosen.
>
> Anyway, I could find out, at least, these complicated issues around
> two-phase aggregate integration with planner. Someone can suggest
> minimum invasive way for these integration?
>
>
Postgres-XC solved this question by creating a plan with two Agg/Group
nodes, one for combining transitioned result and one for creating the
distributed transition results (one per distributed run per group). So,
Agg/Group for combining result had as many Agg/Group nodes as there are
distributed/parallel runs. But XC chose this way to reduce the code
footprint. In Postgres, we can have different nodes for combining and
transitioning as you have specified above. Aggregation is not pathified in
current planner, hence XC took the approach of pushing the Agg nodes down
the plan tree when there was distributed/parallel execution possible. If we
can get aggregation pathified, we can go by path-based approach which might
give a better judgement of whether or not to distribute the aggregates
itself.
Looking at Postgres-XC might be useful to get ideas. I can help you there.
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2015-03-06 06:39:07 | Re: Strange assertion using VACOPT_FREEZE in vacuum.c | 
| Previous Message | Tom Lane | 2015-03-06 05:16:19 | Re: Weirdly pesimistic estimates in optimizer |