Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-01-21 21:08:26
Message-ID: CAKJS1f-RiKMH3FTABnrXWtU+uMrEA3a+at47GdUNu+u7-ySR8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 January 2016 at 06:56, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 20, 2016 at 8:32 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> The other two usages which I have thought of are;
>>
>> 1) Aggregating before UNION ALL, which might be fairly simple after the
>> grouping planner changes, as it may just be a matter of considering another
>> "grouping path" which partially aggregates before the UNION ALL, and
>> performs the final grouping stage after UNION ALL. At this stage it's hard
>> to say how that will work as I'm not sure how far changes to the grouping
>> planner will go. Perhaps Tom can comment?
>
> I hope he will, but in the meantime let me ask how this does us any
> good. UNION ALL turns into an Append node. Pushing aggregation
> through an Append doesn't make anything faster AFAICS *unless* you can
> further optimize beginning at that point. For example, if one or both
> sides of the Append node have a Gather under them, and you can push
> the partial aggregation down into the Gather; or if you hit a Foreign
> Scan and can have it do partial aggregation on the remote side, you
> win. But if you just have:
>
> Finalize Aggregate
> -> Append
> -> Partial Aggregate
> -> Thing One
> -> Partial Aggregate
> -> Thing Two
>
> Instead of:
>
> Aggregate
> -> Append
> -> Thing One
> -> Thing Two
>
> ...then I don't see the point, at least not for a single-group
> Aggregate or HashAggregate. For a GroupAggregate, Thing One and Thing
> Two might need to be sorted, and sorting two or more smaller data sets
> might be faster than sorting one larger data set, but I can't see us
> winning anything very big here.
>
> To be clear, I'm not saying we shouldn't do this. I just don't think
> it does much *by itself*. If you combine it with other optimizations
> that let the aggregation be pushed further down the plan tree, it
> could win big.

Yes, I agree, it's not a big win, at least not in the case of a serial
plan. If each branch of the UNION ALL could be processed in parallel,
then that's different.

It's quite simple to test how much of a win it'll be in the serial
case today, and yes, it's not much, but it's a bit.

create table t1 as select x from generate_series(1,1000000) x(x);
vacuum analyze t1;
select count(*) from (select * from t1 union all select * from t1) t;
count
---------
2000000
(1 row)

Time: 185.793 ms

-- Mock up pushed down aggregation by using sum() as a combine
function for count(*)
select sum(c) from (select count(*) c from t1 union all select
count(*) from t1) t;
sum
---------
2000000
(1 row)

Time: 162.076 ms

Not particularly incredible, but we don't normally turn our noses up
at a 14% improvement, so let's just see how complex it will be to
implement, once the upper planner changes are done.

But as you mention about lack of ability to make use of pre-sorted
Path for each branch of the UNION ALL; I was really hoping Tom's patch
will improve that part by allowing the planner to choose a pre-sorted
Path and perform a MergeAppend instead of an Append, which would allow
pre-sorted input into a GroupAggregate node.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomasz Rybak 2016-01-21 22:13:37 Re: pglogical_output - a general purpose logical decoding output plugin
Previous Message Vladimir Sitnikov 2016-01-21 20:55:43 Re: Set search_path + server-prepared statements = cached plan must not change result type