Re: Combining Aggregates

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2015-07-27 00:14:19
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80111D24A@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 04/01/2015 06:28 PM, Robert Haas wrote:
> > On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>> I've been thinking of bumping this patch to the June commitfest as the
> >>> patch only exists to provide the basic infrastructure for things like
> >>> parallel aggregation, aggregate before join, and perhaps auto updating
> >>> materialised views.
> >>>
> >>> It seems unlikely that any of those things will happen for 9.5.
> >>
> >> Yeah, I guess so...
> >>
> >>> Does anybody object to me moving this to June's commitfest?
> >>
> >> Not from my side FWIW. I think it actually makes sense.
> >
> > +1. I'd like to devote some time to looking at this, but I don't have
> > the time right now. The chances that we can do something useful with
> > it in 9.6 seem good.
>
> And the June commitfest is now in progress.
>
> This patch seems sane to me, as far as it goes. However, there's no
> planner or executor code to use the aggregate combining for anything.
> I'm not a big fan of dead code, I'd really like to see something to use
> this.
>
+1, this patch itself looks good for me, but...

> The main use case people have been talking about is parallel query, but
> is there some other case this would be useful right now, without the
> parallel query feature? You and Simon talked about this case:
>
> > 2. Queries such as:
> >
> > SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
> > = p.product_id GROUP BY p.name;
> >
> > Such a query could be transformed into:
> >
> > SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
> > GROUP BY product_id) s
> > INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;
> >
> > Of course the outer query's SUM and GROUP BY would not be required if there
> > happened to be a UNIQUE index on product(name), but assuming there's not
> > then the above should produce the results faster. This of course works ok
> > for SUM(), but for something like AVG() or STDDEV() the combine/merge
> > aggregate functions would be required to process those intermediate
> > aggregate results that were produced by the sub-query.
>
> Any chance you could implement that in the planner?
>
It likely needs planner enhancement prior to other applications...
http://www.postgresql.org/message-id/CA+TgmobgWKHfZc09B+s2LxJTwoRD5Ht-avoVDvaQ4+RpwrO4bg@mail.gmail.com

Once planner allowed to have both of normal path and partial aggregation
paths to compare according to the cost, it is the straightforward way to
do.

Here are various academic research, for example, below is the good starting
point to clarify aggregate queries that we can run with 2-phase approach.
http://www.researchgate.net/publication/2715288_Performing_Group-By_before_Join

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2015-07-27 01:40:35 Documentation tweak for row-valued expressions and null
Previous Message Kouhei Kaigai 2015-07-26 23:42:07 Re: CustomScan and readfuncs.c