SLOW query with aggregates

From: "A Palmblad" <adampalmblad(at)yahoo(dot)ca>
To: "Postgres Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: SLOW query with aggregates
Date: 2004-03-23 20:03:48
Message-ID: 003201c41111$f4dc96f0$97019696@AERS04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am trying to optimize a query that does a lot of aggregation. I have a
large number of columns that are part of the result, and most are
aggregates. They are acting on two temporary tables, the largest of which
should have at most 1 million tuples, and the smaller around 5000; the the
smaller table matches the number of rows expecting in the result. I've
played around with some indexes on the temp tables, and analyzing them; even
using a vacuum analyze, and the worst part is always a groupAggregate.

This query can be optimized at the expense of other operations; it will
be run during low usage hours. I have tried to bump up sort_mem to get the
query optimizer to cosider a HashAggregate instread of a groupAggregate;
setting it as high as 2 gigs still had the query optimizer using
GroupAggregate.

The troublesome query is:

select
tempItems.category_id,
date('2003-11-22'),
sum(a) as a,
count(*) as b,
sum(case when type = 1 then 0 else someNumber end) as successful,
sum(c) as c,
........
........
tempAggregates.mode as mode
-variations of the above repeated around 30 times, with a few other
aggregates like min and max making an appearance, and some array stuff
from tempItems join tempAggregates using (category_id)
group by tempItems.category_id, mode

I've tried just grouping by category_id, and doing a max(mode), but that
doesn't have much of an effect on performance; although row estimation for
the group aggregate was better. A lot is being done, so maybe I can't get
it to be much more efficient...

Here's the output of an explain analyze:

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---
GroupAggregate (cost=0.00..338300.34 rows=884 width=345) (actual
time=86943.272..382718.104 rows=3117 loops=1)
-> Merge Join (cost=0.00..93642.52 rows=1135610 width=345) (actual
time=0.148..24006.748 rows=1120974 loops=1)
Merge Cond: ("outer".category_id = "inner".category_id)
-> Index Scan using tempaggregatesindex on tempaggregates
(cost=0.00..91.31 rows=3119 width=115) (actual time=0.055..6.573 rows=3117
loops=1)
-> Index Scan using tempitemsindex on tempitems
(cost=0.00..79348.45 rows=1135610 width=241) (actual time=0.064..7511.980
rows=1121164 loops=1)
Total runtime: 382725.502 ms
(6 rows)

Any thoughts or suggestions would be appreciated.

-Adam Palmblad

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-03-23 20:13:29 Re: [ADMIN] Benchmarking postgres on Solaris/Linux
Previous Message Fabio Esposito 2004-03-23 19:21:34 postgres eating CPU on HP9000