Re: Speeding up aggregates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up aggregates
Date: 2002-12-09 21:26:32
Message-ID: 19749.1039469192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Joe Conway <mail(at)joeconway(dot)com> writes:
> Just to follow up on my last post, I did indeed find that bumping up sort_mem
> caused a switch back to HashAggregate, and a big improvement:

> parts=# explain analyze select i.part_id, sum(w.qty_oh) as total_oh from inv
> i, iwhs w where i.part_id = w.part_id group by i.part_id having sum(w.qty_oh) > 0;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=5254.46..5432.10 rows=35528 width=36) (actual
> time=1286.89..1399.36 rows=4189 loops=1)
> Filter: (sum(qty_oh) > 0::double precision)
> -> Hash Join (cost=1319.10..4710.31 rows=72553 width=36) (actual
> time=163.36..947.54 rows=72548 loops=1)

How many rows out if you drop the HAVING clause?

The planner's choice of which to use is dependent on its estimate of the
required hashtable size, which is proportional to its guess about how
many distinct groups there will be. The above output doesn't tell us
that however, only how many groups passed the HAVING clause. I'm
curious about the quality of this estimate, since the code to try to
generate not-completely-bogus group count estimates is all new ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message eric soroos 2002-12-09 21:42:02 Re: questions about disk configurations
Previous Message Joe Conway 2002-12-09 21:04:18 Re: Speeding up aggregates