bad planner pick... but why?

From: <me(at)alternize(dot)com>
To: <pgsql-novice(at)postgresql(dot)org>
Subject: bad planner pick... but why?
Date: 2006-10-16 15:57:32
Message-ID: 03b301c6f13b$caaf7a20$6501a8c0@iwing
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

hi list

please have a look at this two queries:

----------------------------------------

SELECT mov_id FROM oldtables.movies LEFT JOIN oldtables.content ON
movies.mov_id = content.c_m_id
WHERE mov_id IN (SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE
rel_personid = 40544)

plan:

Hash IN Join (cost=205.04..14030.40 rows=1 width=4) (actual
time=105.568..277.782 rows=1 loops=1)
Hash Cond: ("outer".mov_id = "inner".rel_movieid)
-> Hash Left Join (cost=184.00..13824.49 rows=36973 width=4) (actual
time=7.563..260.627 rows=36997 loops=1)
Hash Cond: ("outer".mov_id = "inner".c_m_id)
-> Seq Scan on movies (cost=0.00..13140.73 rows=36973 width=4)
(actual time=0.070..206.254 rows=36997 loops=1)
-> Hash (cost=174.60..174.60 rows=3760 width=4) (actual
time=7.467..7.467 rows=3760 loops=1)
-> Seq Scan on content (cost=0.00..174.60 rows=3760 width=4)
(actual time=0.015..4.729 rows=3760 loops=1)
-> Hash (cost=21.03..21.03 rows=1 width=4) (actual time=0.113..0.113
rows=1 loops=1)
-> Unique (cost=20.99..21.02 rows=1 width=4) (actual
time=0.108..0.109 rows=1 loops=1)
-> Sort (cost=20.99..21.01 rows=6 width=4) (actual
time=0.106..0.107 rows=1 loops=1)
Sort Key: rel_persons.rel_movieid
-> Index Scan using rel_persons_personid_idx on
rel_persons (cost=0.00..20.92 rows=6 width=4) (actual time=0.094..0.098
rows=1 loops=1)
Index Cond: (rel_personid = 40544)
Total runtime: 277.901 ms

----------------------------------------

SELECT mov_id FROM oldtables.movies LEFT JOIN oldtables.content ON
movies.mov_id = content.c_m_id
WHERE c_m_id IN (SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE
rel_personid = 40544)

plan:

Nested Loop (cost=20.99..32.69 rows=1 width=4) (actual time=0.169..0.175
rows=1 loops=1)
-> Nested Loop (cost=20.99..27.05 rows=1 width=8) (actual
time=0.158..0.162 rows=1 loops=1)
-> Unique (cost=20.99..21.02 rows=1 width=4) (actual
time=0.114..0.116 rows=1 loops=1)
-> Sort (cost=20.99..21.01 rows=6 width=4) (actual
time=0.113..0.113 rows=1 loops=1)
Sort Key: rel_persons.rel_movieid
-> Index Scan using rel_persons_personid_idx on
rel_persons (cost=0.00..20.92 rows=6 width=4) (actual time=0.098..0.102
rows=1 loops=1)
Index Cond: (rel_personid = 40544)
-> Index Scan using movies_mov_id_idx on movies (cost=0.00..6.00
rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
Index Cond: ("outer".rel_movieid = movies.mov_id)
-> Index Scan using content_pkey on content (cost=0.00..5.63 rows=1
width=4) (actual time=0.008..0.009 rows=1 loops=1)
Index Cond: ("outer".mov_id = content.c_m_id)
Total runtime: 0.295 ms

----------------------------------------

query #1 is factor 1000 slower, because the two tables "movies" (~40k
entries) and "content" (~30k entries) seem to be joined prior to filtering
by the IN (....). any ideas why the planer decides not to first evaluate the
IN (...) statement in the first case?

here's the plan for the IN (...) subselect:

----------------------------------------

SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE rel_personid =
40544

plan:
Unique (cost=20.99..21.02 rows=1 width=4) (actual time=0.112..0.113 rows=1
loops=1)
-> Sort (cost=20.99..21.01 rows=6 width=4) (actual time=0.110..0.110
rows=1 loops=1)
Sort Key: rel_movieid
-> Index Scan using rel_persons_personid_idx on rel_persons
(cost=0.00..20.92 rows=6 width=4) (actual time=0.095..0.098 rows=1 loops=1)
Index Cond: (rel_personid = 40544)
Total runtime: 0.155 ms

----------------------------------------

its pgsql 8.1

thanks,
thomas

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2006-10-16 16:17:57 Re: bad planner pick... but why?
Previous Message Yadnyesh Joshi 2006-10-16 15:45:59 Re: Using host variables -- segmentation fault