Re: The Future of Aggregation

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "kevin(dot)grittner(at)enterprisedb(dot)com" <kevin(dot)grittner(at)enterprisedb(dot)com>, "amit(dot)kapila(at)enterprisedb(dot)com" <amit(dot)kapila(at)enterprisedb(dot)com>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: The Future of Aggregation
Date: 2015-06-10 02:51:05
Message-ID: CAKJS1f-cMeRGi5sw48svaPLzsady5zXrwCd6kWh9P3a=wHBDcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10 June 2015 at 02:52, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:

> The idea I discussed in the link in item 5 above gets around this
> > problem, but it's a perhaps more surprise filled implementation
> > as it will mean "select avg(x),sum(x),count(x) from t" is
> > actually faster than "select sum(x),count(x) from t" as the agg
> > state for avg() will satisfy sum and count too.
>
> I'm skeptical that it will be noticeably faster. It's easy to see
> why this optimization will make a query *with all three* faster,
> but I would not expect the process of accumulating the sum and
> count to be about the same speed whether performed by one
> transition function or two. Of course I could be persuaded by a
> benchmark showing otherwise.
>
>
Thanks for looking at this.

Assuming that if we reuse the avg(x) state for count(x) and sum(x) then it
will perform almost exactly like a query containing just avg(x), the only
additional overhead is the call to the final functions per group, so in the
following case that's likely immeasurable:

/* setup */ create table millionrowtable as select
generate_series(1,1000000)::numeric as x;
/* test 1 */ SELECT sum(x) / count(x) from millionrowtable;
/* test 2 */ SELECT avg(x) from millionrowtable;

Test 1:
274.979 ms
272.104 ms
269.915 ms

Test 2:
229.619 ms
220.703 ms
234.743 ms

(About 19% slower)

The good news is that it's not slower than before, so should be acceptable,
though hard to explain to people.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-06-10 02:53:55 Re: The Future of Aggregation
Previous Message David Rowley 2015-06-10 02:32:14 Re: Aggregate Supporting Functions