Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-01-19 04:00:21
Message-ID: CAKJS1f8rPq14pZWH_HmLH1gWPcRwzsyUOykaCTTPHdo5xacf7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 January 2016 at 06:03, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>
>
> >
>> > # explain analyze select a%1000000,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> > QUERY PLAN
>> >
>> ---------------------------------------------------------------------------------------------------------------------------
>> > GroupAggregate (cost=119510.84..144510.84 rows=1000000 width=32)
>> (actual
>> > time=538.938..1015.278 rows=1000000 loops=1)
>> > Group Key: ((a % 1000000))
>> > -> Sort (cost=119510.84..122010.84 rows=1000000 width=32) (actual
>> > time=538.917..594.194 rows=1000000 loops=1)
>> > Sort Key: ((a % 1000000))
>> > Sort Method: quicksort Memory: 102702kB
>> > -> Seq Scan on ab (cost=0.00..19853.00 rows=1000000 width=32)
>> > (actual time=0.016..138.964 rows=1000000 loops=1)
>> > Planning time: 0.146 ms
>> > Execution time: 1047.511 ms
>> >
>> >
>> > Patched
>> > # explain analyze select a%1000000,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> > QUERY PLAN
>> >
>> ------------------------------------------------------------------------------------------------------------------------
>> > HashAggregate (cost=24853.00..39853.00 rows=1000000 width=32) (actual
>> > time=8072.346..144424.872 rows=1000000 loops=1)
>> > Group Key: (a % 1000000)
>> > -> Seq Scan on ab (cost=0.00..19853.00 rows=1000000 width=32)
>> (actual
>> > time=0.025..481.332 rows=1000000 loops=1)
>> > Planning time: 0.164 ms
>> > Execution time: 263288.332 ms
>>
>> Well, that's pretty odd. I guess the plan change must be a result of
>> switching the transition type from internal to text, although I'm not
>> immediately certain why that would make a difference.
>>
>
> It is strange, why hashaggregate is too slow?
>

Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million of
them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.

set work_mem = '130MB' does coax the planner into a GroupAggregate plan,
which is faster, but due to the the hash agg executor code not giving any
regard to work_mem. If I set work_mem to 140MB (which is more realistic for
this VM), it does cause the same swapping problems to occur. Probably
setting aggtransspace for this aggregate to 1024 would help the costing
problem, but it would also cause hashagg to be a less chosen option during
planning.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-19 04:14:32 Re: Combining Aggregates
Previous Message Tomas Vondra 2016-01-19 03:57:33 Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)