Re: General performance questions about postgres on Apple

From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: General performance questions about postgres on Apple
Date: 2004-02-20 21:57:42
Message-ID: 403682D6.60807@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

scott.marlowe wrote:

>On Fri, 20 Feb 2004, Sean Shanny wrote:
>
>
>
>>max_connections = 100
>>
>># - Memory -
>>
>>shared_buffers = 16000 # min 16, at least max_connections*2,
>>8KB each
>>sort_mem = 256000 # min 64, size in KB
>>
>>
>
>You might wanna drop sort_mem somewhat and just set it during your imports
>to something big like 512000 or larger. That way with 100 users during
>the day you won't have to worry about swap storms, and when you run your
>updates, you get all that sort_mem.
>
>
>
>>Actual row count in the temp table:
>>
>>select count(*) from referral_temp ;
>> 502347
>>
>>Actual row count in d_referral table:
>>
>>select count(*) from d_referral ;
>> 27908024
>>
>>
>>Note: that an analyze had not been performed on the referral_temp table
>>prior to the explain analyze run.
>>
>>explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
>>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5
>>
>>Nested Loop Left Join (cost=0.00..3046.00 rows=1001 width=68) (actual
>>time=136.513..6440616.541 rows=502347 loops=1)
>> -> Seq Scan on referral_temp t2 (cost=0.00..20.00 rows=1000
>>width=64) (actual time=21.730..10552.421 rows=502347 loops=1)
>> -> Index Scan using d_referral_referral_md5_key on d_referral t1
>>(cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1
>>loops=502347)
>> Index Cond: ("outer".md5 = t1.referral_md5)
>>
>>
>>Thanks.
>>
>>--sean
>> Total runtime: 6441969.698 ms
>>(5 rows)
>>
>>
>>Here is an explain analyze after the analyze was done. Unfortunately I
>>think a lot of the data was still in cache when I did this again :-(
>>
>>explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
>>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
>>
>>Nested Loop Left Join (cost=0.00..1468759.69 rows=480082 width=149)
>>(actual time=69.576..3226854.850 rows=502347 loops=1)
>> -> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081
>>width=145) (actual time=11.206..4003.521 rows=502347 loops=1)
>> -> Index Scan using d_referral_referral_md5_key on d_referral t1
>>(cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1
>>loops=502347)
>> Index Cond: ("outer".md5 = t1.referral_md5)
>> Total runtime: 3227830.752 ms
>>
>>
>
>Hmmm. It looks like postgresql is still picking a nested loop when it
>should be sorting something faster. Try doing a "set enable_nestloop =
>off" and see what you get.
>
>
New results with the above changes: (Rather a huge improvement!!!)
Thanks Scott. I will next attempt to make the cpu_* changes to see if
it the picks the correct plan.

explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1669281.60..3204008.48 rows=480082 width=149)
(actual time=157221.125..-412311.378 rows=502347 loops=1)
Hash Cond: ("outer".md5 = "inner".referral_md5)
-> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081
width=145) (actual time=11.537..1852.336 rows=502347 loops=1)
-> Hash (cost=1356358.48..1356358.48 rows=30344048 width=40)
(actual time=157187.530..157187.530 rows=0 loops=1)
-> Seq Scan on d_referral t1 (cost=0.00..1356358.48
rows=30344048 width=40) (actual time=14.134..115048.285 rows=27908024
loops=1)
Total runtime: 212595.909 ms
(6 rows)

Time: 213094.984 ms
tripmaster=# explain analyze SELECT t1.id, t2.md5, t2.url from url_temp
t2 LEFT OUTER JOIN d_url t1 ON t2.md5 = t1.url_md5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=2023843.40..3157938.15 rows=1379872 width=191)
(actual time=178150.113..867074.579 rows=1172920 loops=1)
Hash Cond: ("outer".md5 = "inner".url_md5)
-> Seq Scan on url_temp t2 (cost=0.00..50461.72 rows=1379872
width=187) (actual time=6.597..6692.324 rows=1172920 loops=1)
-> Hash (cost=1734904.72..1734904.72 rows=28018272 width=40)
(actual time=178124.568..178124.568 rows=0 loops=1)
-> Seq Scan on d_url t1 (cost=0.00..1734904.72 rows=28018272
width=40) (actual time=16.912..2639059.078 rows=23239137 loops=1)
Total runtime: 242846.965 ms
(6 rows)

Time: 243190.900 ms

>If that makes it faster, you may want to adjust the costs of the cpu_*
>stuff higher to see if that can force it to do the right thing.
>
>Looking at the amount of time taken by the nested loop, it looks like the
>problem to me.
>
>And why are you doing a left join of ONE row from one table against the
>whole temp table? Do you really need to do that? since there's only one
>row in the source table, and I'd guess is only matches one or a few rows
>from the temp table, this means you're gonna have that one row and a bunch
>of null filled rows to go with it.
>
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Don Bowman 2004-02-21 21:12:24 conceptual method to create high performance query involving time
Previous Message scott.marlowe 2004-02-20 21:10:07 Re: General performance questions about postgres on Apple