Re: possible optimization: push down aggregates

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible optimization: push down aggregates
Date: 2014-08-27 19:52:06
Message-ID: CAHyXU0zObaPrkk7ECyGVa19hjxyV0JXHf8NUnsueLwtzwCkpRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 27, 2014 at 2:46 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> Hi
>>>
>>> one user asked about using a partitioning for faster aggregates queries.
>>>
>>> I found so there is not any optimization.
>>>
>>> create table x1(a int, d date);
>>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
>>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
>>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>>>
>>> When I have this schema, then optimizer try to do
>>>
>>> postgres=# explain verbose select max(a) from x1 group by d order by d;
>>> QUERY PLAN
>>> --------------------------------------------------------------------------------
>>> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
>>> Output: max(x1.a), x1.d
>>> Group Key: x1.d
>>> -> Sort (cost=684.79..706.19 rows=8561 width=8)
>>> Output: x1.d, x1.a
>>> Sort Key: x1.d
>>> -> Append (cost=0.00..125.60 rows=8561 width=8)
>>> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
>>> Output: x1.d, x1.a
>>> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
>>> width=8)
>>> Output: x_1.d, x_1.a
>>> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
>>> width=8)
>>> Output: x_2.d, x_2.a
>>> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
>>> width=8)
>>> Output: x_3.d, x_3.a
>>> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
>>> width=8)
>>> Output: x_4.d, x_4.a
>>> Planning time: 0.333 ms
>>>
>>> It can be reduced to:
>>>
>>> sort by d
>>> Append
>>> Aggegate (a), d
>>> seq scan from x_1
>>> Aggregate (a), d
>>> seq scan from x_2
>>>
>>> Are there some plans to use partitioning for aggregation?
>>
>> Besides min/max, what other aggregates (mean/stddev come to mind)
>> would you optimize and how would you determine which ones could be?
>> Where is that decision made?
>
>
> You can't with mean and stddev, only with associative aggregates.

associative bit just makes it easier (which is important of course!).
mean for example can be pushed down if the 'pushed down' aggregates
return to the count to the "reaggregator" so that you can weight the
final average. that's a lot more complicated though.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-27 19:52:18 Re: SKIP LOCKED DATA (work in progress)
Previous Message Pavel Stehule 2014-08-27 19:48:34 Re: possible optimization: push down aggregates