Skip site navigation (1) Skip section navigation (2)

SLOW query with aggregates

From: "A Palmblad" <adampalmblad(at)yahoo(dot)ca>
To: "Postgres Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: SLOW query with aggregates
Date: 2004-03-23 20:03:48
Message-ID: 003201c41111$f4dc96f0$97019696@AERS04 (view raw or flat)
Thread:
Lists: pgsql-performance
I am trying to optimize a query that does a lot of aggregation.  I have a
large number of columns that are part of the result, and most are
aggregates.  They are acting on two temporary tables, the largest of which
should have at most 1 million tuples, and the smaller around 5000; the the
smaller table matches the number of rows expecting in the result.  I've
played around with some indexes on the temp tables, and analyzing them; even
using a vacuum analyze, and the worst part is always a groupAggregate.

    This query can be optimized at the expense of other operations; it will
be run during low usage hours.  I have tried to bump up sort_mem to get the
query optimizer to cosider a HashAggregate instread of a groupAggregate;
setting it as high as 2 gigs still had the query optimizer using
GroupAggregate.

The troublesome query is:

select
    tempItems.category_id,
    date('2003-11-22'),
    sum(a) as a,
    count(*) as b,
    sum(case when type = 1 then 0 else someNumber end) as successful,
    sum(c) as c,
    ........
    ........
    tempAggregates.mode as mode
    -variations of the above repeated around 30 times, with a few other
aggregates like min and max making an appearance, and some array stuff
   from tempItems join tempAggregates using (category_id)
   group by tempItems.category_id, mode

I've tried just grouping by category_id, and doing a max(mode), but that
doesn't have much of an effect on performance; although row estimation for
the group aggregate was better.  A lot is being done, so maybe I can't get
it to be much more efficient...

Here's the output of an explain analyze:


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---
 GroupAggregate  (cost=0.00..338300.34 rows=884 width=345) (actual
time=86943.272..382718.104 rows=3117 loops=1)
   ->  Merge Join  (cost=0.00..93642.52 rows=1135610 width=345) (actual
time=0.148..24006.748 rows=1120974 loops=1)
         Merge Cond: ("outer".category_id = "inner".category_id)
         ->  Index Scan using tempaggregatesindex on tempaggregates
(cost=0.00..91.31 rows=3119 width=115) (actual time=0.055..6.573 rows=3117
loops=1)
         ->  Index Scan using tempitemsindex on tempitems
(cost=0.00..79348.45 rows=1135610 width=241) (actual time=0.064..7511.980
rows=1121164 loops=1)
 Total runtime: 382725.502 ms
(6 rows)

Any thoughts or suggestions would be appreciated.

-Adam Palmblad


Responses

pgsql-performance by date

Next:From: Josh BerkusDate: 2004-03-23 20:13:29
Subject: Re: [ADMIN] Benchmarking postgres on Solaris/Linux
Previous:From: Fabio EspositoDate: 2004-03-23 19:21:34
Subject: postgres eating CPU on HP9000

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group