Re: Slow query with a lot of data

From: Moritz Onken <onken(at)houseofdesign(dot)de>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query with a lot of data
Date: 2008-08-18 14:43:16
Message-ID: 8EF0BB61-D8FE-47E8-97DD-1009F7EB5431@houseofdesign.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Am 18.08.2008 um 16:30 schrieb Matthew Wakeling:

> On Mon, 18 Aug 2008, Moritz Onken wrote:
>> I have indexes on result.domain, domain_categories.domain,
>> result.user, domain_categories.category. Clustered result on user
>> and domain_categories on domain.
>
>> " -> Materialize (cost=2118752.28..2270064.64
>> rows=12104989 width=8) (actual time=46460.599..82336.116
>> rows=12123161 loops=1)"
>> " -> Sort (cost=2118752.28..2149014.75 rows=12104989
>> width=8) (actual time=46460.592..59595.851 rows=12104989 loops=1)"
>> " Sort Key: b.domain"
>> " Sort Method: external sort Disk: 283992kB"
>> " -> Seq Scan on domain_categories b
>> (cost=0.00..198151.89 rows=12104989 width=8) (actual
>> time=14.352..22572.869 rows=12104989 loops=1)"
>
> This is weird, given you say you have clustered domain_categories on
> domain. Have you analysed? You should be able to run:
>
> EXPLAIN SELECT * from domain_categories ORDER BY domain
>
> and have it say "Index scan" instead of "Seq Scan followed by disc
> sort)".
>
> Matthew
>

Thanks, the index was created but I forgot to run analyze again on
that table.

I had a little mistake in my previous sql query. The corrected version
is this:
explain analyze select a."user", b.category, count(1) from result a,
domain_categories b where a."domain" = b."domain" and a."user" = 1337
group by a."user", b.category;

(notice the additional group by column).

explain analyze:

"HashAggregate (cost=817397.78..817428.92 rows=2491 width=8) (actual
time=42874.339..42878.419 rows=3361 loops=1)"
" -> Merge Join (cost=748.47..674365.50 rows=19070970 width=8)
(actual time=15702.449..42829.388 rows=36308 loops=1)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on
domain_categories b (cost=0.00..391453.79 rows=12105014 width=8)
(actual time=39.018..30166.349 rows=12104989 loops=1)"
" -> Sort (cost=395.52..405.49 rows=3985 width=8) (actual
time=0.188..32.345 rows=36309 loops=1)"
" Sort Key: a.domain"
" Sort Method: quicksort Memory: 27kB"
" -> Index Scan using result_user_idx on result a
(cost=0.00..157.21 rows=3985 width=8) (actual time=0.021..0.101
rows=61 loops=1)"
" Index Cond: ("user" = 1337)"
"Total runtime: 42881.382 ms"

This is still very slow...

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2008-08-18 14:57:22 Re: Slow query with a lot of data
Previous Message Matthew Wakeling 2008-08-18 14:30:42 Re: Slow query with a lot of data