Query tuning

From: Kevin Kempter <kevink(at)consistentstate(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query tuning
Date: 2009-08-19 16:28:41
Message-ID: 200908191028.42105.kevink@consistentstate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all;

we've been fighting this query for a few days now. we bumped up the statistict
target for the a.id , c.url_hits_id and the b.id columns below to 250 and ran
an analyze on the relevant tables. we killed it after 8hrs.

Note the url_hits table has > 1.4billion rows

Any suggestions?

$ psql -ef expl.sql pwreport
explain
select
a.id,
ident_id,
time,
customer_name,
extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
category_id
from
pwreport.url_hits a left outer join
pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
pwreport.ident b,
pwreport.timezone e
where
a.ident_id = b.id
and b.timezone_id = e.id
and time >= extract ('epoch' from timestamp '2009-08-12')
and time < extract ('epoch' from timestamp '2009-08-13' )
and direction = 'REQUEST'
;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=47528508.61..180424544.59 rows=10409251 width=53)
Merge Cond: (c.url_hits_id = a.id)
-> Index Scan using mt_url_hits_category_jt_url_hits_id_index on
url_hits_category_jt c (cost=0.00..122162596.63 rows=4189283233 width=8)
-> Sort (cost=47528508.61..47536931.63 rows=3369210 width=49)
Sort Key: a.id
-> Hash Join (cost=2565.00..47163219.21 rows=3369210 width=49)
Hash Cond: (b.timezone_id = e.id)
-> Hash Join (cost=2553.49..47116881.07 rows=3369210
width=37)
Hash Cond: (a.ident_id = b.id)
-> Seq Scan on url_hits a (cost=0.00..47051154.89
rows=3369210 width=12)
Filter: ((direction =
'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=
1250035200::double precision) AND (("time")::double precision <
1250121600::double precision))
-> Hash (cost=2020.44..2020.44 rows=42644 width=29)
-> Seq Scan on ident b (cost=0.00..2020.44
rows=42644 width=29)
-> Hash (cost=6.78..6.78 rows=378 width=20)
-> Seq Scan on timezone e (cost=0.00..6.78 rows=378
width=20)
(15 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Grzegorz Jaśkiewicz 2009-08-19 16:38:08 Re: Query tuning
Previous Message Scott Marlowe 2009-08-19 16:19:14 Re: PG 8.3 and server load