Re: Slow query with a lot of data

From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: "Moritz Onken" <onken(at)houseofdesign(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query with a lot of data
Date: 2008-08-19 14:49:29
Message-ID: a1ec7d000808190749n43cdf29ap538c21b09cce1eac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

What is your work_mem set to? The default?

Try increasing it significantly if you have the RAM and seeing if that
affects the explain plan. You may even want to set it to a number larger
than the RAM you have just to see what happens. In all honesty, it may be
faster to overflow to OS swap space than sort too many rows, but ONLY if it
changes the plan to a significantly more efficient one.

Simply type
'SET work_mem = '500MB';
before running your explain. Set it to even more RAM if you have the space
for this experiment.

In my experience the performance of aggregates on large tables is
significantly affected by work_mem and the optimizer will chosse poorly
without enough of it. It will rule out plans that may be fast enough when
overflowing to disk in preference to colossal sized sorts (which likely also
overflow to disk but take hours or days).

On Tue, Aug 19, 2008 at 5:47 AM, Moritz Onken <onken(at)houseofdesign(dot)de>wrote:

>
> Am 19.08.2008 um 14:17 schrieb Matthew Wakeling:
>
> On Tue, 19 Aug 2008, Moritz Onken wrote:
>>
>>> tablename | attname | n_distinct | correlation
>>> result | domain | 1642 | 1
>>>
>>
>> Well, the important thing is the correlation, which is 1, indicating that
>> Postgres knows that the table is clustered. So I have no idea why it is
>> sorting the entire table.
>>
>> What happens when you run EXPLAIN SELECT * FROM result ORDER BY domain?
>>
>>
> "Index Scan using result_domain_idx on result (cost=0.00..748720.72
> rows=20306816 width=49)"
> ... as it should be.
>
> Sounds like an awfully long time to me. Also, I think restricting it to
>>>> 280 users is probably not making it any faster.
>>>>
>>>
>>> If I hadn't restricted it to 280 users it would have run ~350days...
>>>
>>
>> What makes you say that? Perhaps you could post EXPLAINs of both of the
>> queries.
>>
>> Matthew
>>
>
> That was just a guess. The query needs to retrieve the data for about
> 50,000 users. But it should be fast if I don't retrieve the data for
> specific users but let in run through all rows.
>
> explain insert into setup1 (select
> a."user", b.category, sum(1.0/b.cat_count)::float
> from result a, domain_categories b
> where a."domain" = b."domain"
> and b.depth < 4
> and a.results > 100
> and a."user" < 30000
> group by a."user", b.category);
>
>
> "GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)"
> " -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
> " Sort Key: a."user", b.category"
> " -> Merge Join (cost=149241.25..1287278.89 rows=53171707
> width=12)"
> " Merge Cond: (b.domain = a.domain)"
> " -> Index Scan using domain_categories_domain on
> domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
> " Filter: (depth < 4)"
> " -> Materialize (cost=148954.16..149446.36 rows=39376
> width=8)"
> " -> Sort (cost=148954.16..149052.60 rows=39376
> width=8)"
> " Sort Key: a.domain"
> " -> Bitmap Heap Scan on result a
> (cost=1249.93..145409.79 rows=39376 width=8)"
> " Recheck Cond: ("user" < 30000)"
> " Filter: (results > 100)"
> " -> Bitmap Index Scan on result_user_idx
> (cost=0.00..1240.08 rows=66881 width=0)"
> " Index Cond: ("user" < 30000)"
>
>
> This query limits the number of users to 215 and this query took about 50
> minutes.
> I could create to temp tables which have only those records which I need
> for this query. Would this be a good idea?
>
>
> moritz
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Moritz Onken 2008-08-19 15:23:42 Re: Slow query with a lot of data
Previous Message Gauri Kanekar 2008-08-19 13:34:17 Re: Cross Join Problem