Re: Speeding up aggregates

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up aggregates
Date: 2002-12-09 21:48:21
Message-ID: 3DF50FA5.3090000@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>
>>Just to follow up on my last post, I did indeed find that bumping up sort_mem
>>caused a switch back to HashAggregate, and a big improvement:
>
>
>>parts=# explain analyze select i.part_id, sum(w.qty_oh) as total_oh from inv
>>i, iwhs w where i.part_id = w.part_id group by i.part_id having sum(w.qty_oh) > 0;
>> QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------
>> HashAggregate (cost=5254.46..5432.10 rows=35528 width=36) (actual
>>time=1286.89..1399.36 rows=4189 loops=1)
>> Filter: (sum(qty_oh) > 0::double precision)
>> -> Hash Join (cost=1319.10..4710.31 rows=72553 width=36) (actual
>>time=163.36..947.54 rows=72548 loops=1)
>
>
> How many rows out if you drop the HAVING clause?

parts=# set sort_mem to 8000;
SET
parts=# explain analyze select i.part_id, sum(w.qty_oh) as total_oh from inv
i, iwhs w where i.part_id = w.part_id group by i.part_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=5617.22..5706.04 rows=35528 width=36) (actual
time=1525.93..1627.41 rows=34575 loops=1)
-> Hash Join (cost=1319.10..5254.45 rows=72553 width=36) (actual
time=156.86..1248.73 rows=72548 loops=1)
Hash Cond: ("outer".part_id = "inner".part_id)
-> Seq Scan on iwhs w (cost=0.00..2121.53 rows=72553 width=22)
(actual time=0.01..274.00 rows=72553 loops=1)
-> Hash (cost=1230.28..1230.28 rows=35528 width=14) (actual
time=156.65..156.65 rows=0 loops=1)
-> Seq Scan on inv i (cost=0.00..1230.28 rows=35528
width=14) (actual time=0.03..86.86 rows=35528 loops=1)
Total runtime: 1680.86 msec
(7 rows)

> The planner's choice of which to use is dependent on its estimate of the
> required hashtable size, which is proportional to its guess about how
> many distinct groups there will be. The above output doesn't tell us
> that however, only how many groups passed the HAVING clause. I'm
> curious about the quality of this estimate, since the code to try to
> generate not-completely-bogus group count estimates is all new ...

If I'm reading it correctly, it looks like the estimate in this case is pretty
good.

Joe

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2002-12-09 22:06:50 Re: Speeding up aggregates
Previous Message eric soroos 2002-12-09 21:42:02 Re: questions about disk configurations