Re: Lot'sa joins - performance tip-up, please?

From: Mario Splivalo <msplival(at)jagor(dot)srce(dot)hr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Lot'sa joins - performance tip-up, please?
Date: 2006-05-04 14:45:57
Message-ID: 1146753957.8538.30.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2006-05-03 at 13:58 -0400, Tom Lane wrote:
> Mario Splivalo <msplival(at)jagor(dot)srce(dot)hr> writes:
> > I have a quite large query that takes over a minute to run on my laptop.
>
> The EXPLAIN output you provided doesn't seem to agree with the stated
> query. Where'd the "service_id = 1102" condition come from?

I guess I copypasted the additional WHERE to te EXPLAIN ANALYZE query.
This is the correct one, without the WHERE:

Hash Left Join (cost=198628.35..202770.61 rows=121 width=264) (actual
time=998008.264..999645.322 rows=5706 loops=1)
Hash Cond: ("outer".message_id = "inner".message_id)
-> Merge Left Join (cost=21943.23..21950.96 rows=121 width=238)
(actual time=4375.510..4540.772 rows=5706 loops=1)
Merge Cond: ("outer".message_id = "inner".message_id)
-> Sort (cost=21847.62..21847.92 rows=121 width=230) (actual
time=3304.787..3378.515 rows=5706 loops=1)
Sort Key: messages.id
-> Hash Join (cost=20250.16..21843.43 rows=121
width=230) (actual time=1617.370..3102.470 rows=5706 loops=1)
Hash Cond: ("outer".message_id = "inner".id)
-> Seq Scan on ticketing_messages
(cost=0.00..1212.37 rows=75937 width=14) (actual time=10.554..609.967
rows=75937 loops=1)
-> Hash (cost=20244.19..20244.19 rows=2391
width=216) (actual time=1572.889..1572.889 rows=5706 loops=1)
-> Nested Loop (cost=1519.21..20244.19
rows=2391 width=216) (actual time=385.582..1449.207 rows=5706 loops=1)
-> Seq Scan on services
(cost=0.00..4.20 rows=3 width=54) (actual time=20.829..20.859 rows=2
loops=1)
Filter: (type_id = 10)
-> Bitmap Heap Scan on messages
(cost=1519.21..6726.74 rows=1594 width=162) (actual
time=182.346..678.800 rows=2853 loops=2)
Recheck Cond: (("outer".id =
messages.service_id) AND (messages.receiving_time >= '2006-02-12
00:00:00+01'::timestamp with time zone) AND (messages.receiving_time <=
'2006-03-18 23:00:00+01'::timestamp with time zone))
-> BitmapAnd
(cost=1519.21..1519.21 rows=1594 width=0) (actual time=164.311..164.311
rows=0 loops=2)
-> Bitmap Index Scan on
idx_service_id (cost=0.00..84.10 rows=14599 width=0) (actual
time=66.809..66.809 rows=37968 loops=2)
Index Cond:
("outer".id = messages.service_id)
-> Bitmap Index Scan on
idx_messages_receiving_time (cost=0.00..1434.87 rows=164144 width=0)
(actual time=192.633..192.633 rows=184741 loops=1)
Index Cond:
((receiving_time >= '2006-02-12 00:00:00+01'::timestamp with time zone)
AND (receiving_time <= '2006-03-18 23:00:00+01'::timestamp with time
zone))
-> Sort (cost=95.62..99.17 rows=1421 width=8) (actual
time=1070.678..1072.999 rows=482 loops=1)
Sort Key: ticketing_winners.message_id
-> Seq Scan on ticketing_winners (cost=0.00..21.21
rows=1421 width=8) (actual time=424.836..1061.834 rows=1421 loops=1)
-> Hash (cost=176144.30..176144.30 rows=57925 width=26) (actual
time=993592.980..993592.980 rows=57925 loops=1)
-> Nested Loop (cost=0.00..176144.30 rows=57925 width=26)
(actual time=1074.984..992536.243 rows=57925 loops=1)
-> Seq Scan on ticketing_codes_played
(cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993
rows=57925 loops=1)
-> Index Scan using ticketing_codes_pk on
ticketing_codes (cost=0.00..3.01 rows=1 width=18) (actual
time=17.044..17.052 rows=1 loops=57925)
Index Cond: (ticketing_codes.code_id =
"outer".code_id)
Total runtime: 999778.981 ms

> In general, I'd suggest playing around with the join order. Existing
> releases of PG tend to throw up their hands when faced with a mixture of
> outer joins and regular joins, and just join the tables in the order
> listed. 8.2 will be smarter about this, but for now you have to do it
> by hand ...

No luck for me there. But, I found out that if I first do join on
ticketing_codes and ticketing_codes_played, put the result to temporary
table, and then join that temporary table with the rest of the query
(the SELECT that is in parenthesis is transfered to a temporary table)
the query is almost twice as fast.

As mentioned before, ticketing_codes has 11000000 records.

Mario

P.S. Is it just me, or posting to psql-perofrmance is laged, quite a
bit?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Mckenzie 2006-05-04 14:49:06 Re: Postgres 7.4 and vacuum_cost_delay.
Previous Message Mario Splivalo 2006-05-04 14:15:13 Re: Lot'sa joins - performance tip-up, please?