Skip site navigation (1) Skip section navigation (2)

Re: Speeding up aggregates

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up aggregates
Date: 2002-12-09 21:48:21
Message-ID: 3DF50FA5.3090000@joeconway.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
> 
>>Just to follow up on my last post, I did indeed find that bumping up sort_mem
>>caused a switch back to HashAggregate, and a big improvement:
> 
> 
>>parts=# explain analyze select i.part_id, sum(w.qty_oh) as total_oh from inv 
>>i, iwhs w where i.part_id = w.part_id group by i.part_id having sum(w.qty_oh) > 0;
>>                                                          QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------
>>  HashAggregate  (cost=5254.46..5432.10 rows=35528 width=36) (actual 
>>time=1286.89..1399.36 rows=4189 loops=1)
>>    Filter: (sum(qty_oh) > 0::double precision)
>>    ->  Hash Join  (cost=1319.10..4710.31 rows=72553 width=36) (actual 
>>time=163.36..947.54 rows=72548 loops=1)
> 
> 
> How many rows out if you drop the HAVING clause?

parts=# set sort_mem to 8000;
SET
parts=# explain analyze select i.part_id, sum(w.qty_oh) as total_oh from inv 
i, iwhs w where i.part_id = w.part_id group by i.part_id;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=5617.22..5706.04 rows=35528 width=36) (actual 
time=1525.93..1627.41 rows=34575 loops=1)
    ->  Hash Join  (cost=1319.10..5254.45 rows=72553 width=36) (actual 
time=156.86..1248.73 rows=72548 loops=1)
          Hash Cond: ("outer".part_id = "inner".part_id)
          ->  Seq Scan on iwhs w  (cost=0.00..2121.53 rows=72553 width=22) 
(actual time=0.01..274.00 rows=72553 loops=1)
          ->  Hash  (cost=1230.28..1230.28 rows=35528 width=14) (actual 
time=156.65..156.65 rows=0 loops=1)
                ->  Seq Scan on inv i  (cost=0.00..1230.28 rows=35528 
width=14) (actual time=0.03..86.86 rows=35528 loops=1)
  Total runtime: 1680.86 msec
(7 rows)


> The planner's choice of which to use is dependent on its estimate of the
> required hashtable size, which is proportional to its guess about how
> many distinct groups there will be.  The above output doesn't tell us
> that however, only how many groups passed the HAVING clause.  I'm
> curious about the quality of this estimate, since the code to try to
> generate not-completely-bogus group count estimates is all new ...

If I'm reading it correctly, it looks like the estimate in this case is pretty 
good.

Joe



In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2002-12-09 22:06:50
Subject: Re: Speeding up aggregates
Previous:From: eric soroosDate: 2002-12-09 21:42:02
Subject: Re: questions about disk configurations

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group