Re: Slow query with a lot of data

From: Moritz Onken <onken(at)houseofdesign(dot)de>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query with a lot of data
Date: 2008-08-19 12:47:33
Message-ID: 380CD097-4A33-4134-8C13-B9BF34EC6C63@houseofdesign.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Am 19.08.2008 um 14:17 schrieb Matthew Wakeling:

> On Tue, 19 Aug 2008, Moritz Onken wrote:
>> tablename | attname | n_distinct | correlation
>> result | domain | 1642 | 1
>
> Well, the important thing is the correlation, which is 1, indicating
> that Postgres knows that the table is clustered. So I have no idea
> why it is sorting the entire table.
>
> What happens when you run EXPLAIN SELECT * FROM result ORDER BY
> domain?
>

"Index Scan using result_domain_idx on result (cost=0.00..748720.72
rows=20306816 width=49)"
... as it should be.

>>> Sounds like an awfully long time to me. Also, I think restricting
>>> it to 280 users is probably not making it any faster.
>>
>> If I hadn't restricted it to 280 users it would have run ~350days...
>
> What makes you say that? Perhaps you could post EXPLAINs of both of
> the queries.
>
> Matthew

That was just a guess. The query needs to retrieve the data for about
50,000 users. But it should be fast if I don't retrieve the data for
specific users but let in run through all rows.

explain insert into setup1 (select
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
and a."user" < 30000
group by a."user", b.category);

"GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
" Sort Key: a."user", b.category"
" -> Merge Join (cost=149241.25..1287278.89 rows=53171707
width=12)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on
domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Filter: (depth < 4)"
" -> Materialize (cost=148954.16..149446.36 rows=39376
width=8)"
" -> Sort (cost=148954.16..149052.60 rows=39376
width=8)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a
(cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on
result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"

This query limits the number of users to 215 and this query took about
50 minutes.
I could create to temp tables which have only those records which I
need for this query. Would this be a good idea?

moritz

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gauri Kanekar 2008-08-19 13:34:17 Re: Cross Join Problem
Previous Message Matthew Wakeling 2008-08-19 12:17:25 Re: Slow query with a lot of data