Skip site navigation (1) Skip section navigation (2)

Re: General performance questions about postgres on Apple

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: General performance questions about postgres on Apple
Date: 2004-02-20 21:10:07
Message-ID: Pine.LNX.4.33.0402201405160.11556-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Fri, 20 Feb 2004, Sean Shanny wrote:

> max_connections = 100
> 
> # - Memory -
>  
> shared_buffers = 16000          # min 16, at least max_connections*2, 
> 8KB each
> sort_mem = 256000               # min 64, size in KB

You might wanna drop sort_mem somewhat and just set it during your imports 
to something big like 512000 or larger.  That way with 100 users during 
the day you won't have to worry about swap storms, and when you run your 
updates, you get all that sort_mem.

> Actual row count in the temp table:
> 
> select count(*) from referral_temp ;
>   502347
> 
> Actual row count in d_referral table:
> 
> select count(*) from d_referral ;
>   27908024
>  
> 
> Note: that an analyze had not been performed on the referral_temp table 
> prior to the explain analyze run.
> 
> explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT 
> OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5
> 
> Nested Loop Left Join  (cost=0.00..3046.00 rows=1001 width=68) (actual 
> time=136.513..6440616.541 rows=502347 loops=1)
>    ->  Seq Scan on referral_temp t2  (cost=0.00..20.00 rows=1000 
> width=64) (actual time=21.730..10552.421 rows=502347 loops=1)
>    ->  Index Scan using d_referral_referral_md5_key on d_referral t1  
> (cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1 
> loops=502347)
>          Index Cond: ("outer".md5 = t1.referral_md5)
> 
> 
> Thanks.
> 
> --sean
>  Total runtime: 6441969.698 ms
> (5 rows)
> 
> 
> Here is an explain analyze after the analyze was done.  Unfortunately I 
> think a lot of the data was still in cache when I did this again :-(
> 
> explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT 
> OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
> 
> Nested Loop Left Join  (cost=0.00..1468759.69 rows=480082 width=149) 
> (actual time=69.576..3226854.850 rows=502347 loops=1)
>    ->  Seq Scan on referral_temp t2  (cost=0.00..16034.81 rows=480081 
> width=145) (actual time=11.206..4003.521 rows=502347 loops=1)
>    ->  Index Scan using d_referral_referral_md5_key on d_referral t1  
> (cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1 
> loops=502347)
>          Index Cond: ("outer".md5 = t1.referral_md5)
>  Total runtime: 3227830.752 ms

Hmmm.  It looks like postgresql is still picking a nested loop when it 
should be sorting something faster.  Try doing a "set enable_nestloop = 
off" and see what you get.

If that makes it faster, you may want to adjust the costs of the cpu_* 
stuff higher to see if that can force it to do the right thing.

Looking at the amount of time taken by the nested loop, it looks like the 
problem to me.

And why are you doing a left join of ONE row from one table against the 
whole temp table?  Do you really need to do that?  since there's only one 
row in the source table, and I'd guess is only matches one or a few rows 
from the temp table, this means you're gonna have that one row and a bunch 
of null filled rows to go with it.


In response to

Responses

pgsql-performance by date

Next:From: Sean ShannyDate: 2004-02-20 21:57:42
Subject: Re: General performance questions about postgres on Apple
Previous:From: Bruno Wolff IIIDate: 2004-02-20 20:11:39
Subject: Re: Index called with Union but not with OR clause

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group