Am 21.08.2008 um 09:04 schrieb Moritz Onken:
> Am 20.08.2008 um 20:28 schrieb Tom Lane:
>> "Scott Carey" <scott(at)richrelevance(dot)com> writes:
>>> The planner actually thinks there will only be 28704 rows returned
>>> of width
>>> 12. But it chooses to sort 53 million rows before aggregating.
>>> either a bug or there's something else wrong here. That is the
>>> wrong way
>>> to aggregate those results no matter how much work_mem you have
>>> unless I'm
>>> completely missing something...
>> That does look weird. What are the datatypes of the columns being
>> grouped by? Maybe they're not hashable?
>> Another forcing function that prevents use of HashAgg is DISTINCT
>> aggregates, but you don't seem to have any in this query...
>> regards, tom lane
> The datatypes are both integers. There is no DISTINCT in this query.
> Thanks anyway!
insert into setup1 (select
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
group by a."user", b.category);
This query inserted a total of 16,000,000 rows and, with work_mem set
took about 24 hours.
Any more ideas to speed this up?
In response to
pgsql-performance by date
|Next:||From: Christiaan Willemsen||Date: 2008-08-21 08:34:19|
|Subject: Re: How to setup disk spindles for best performance|
|Previous:||From: Moritz Onken||Date: 2008-08-21 07:04:38|
|Subject: Re: Slow query with a lot of data |