Planner incorrectly choosing seq scan over index scan

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner incorrectly choosing seq scan over index scan
Date: 2005-08-01 22:19:27
Message-ID: fc5b04ca05080115195d15c456@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

We're using 8.0.3 and we're seeing a problem where the planner is choosing a
seq scan and hash join over an index scan. If I set enable_hashjoin to off,
then I get the plan I'm expecting and the query runs a lot faster. I've also
tried lowering the random page cost (even to 1) but the planner still
chooses to use the hash join.

Does anyone have any thoughts/suggestions? I saw that there was a thread
recently in which the planner wasn't correctly estimating the cost for
queries using LIMIT. Is it possible that something similar is happening here
(perhaps because of the sort) and that the patch Tom proposed would fix it?

Thanks. Here are the various queries and plans:

Normal settings
------------------------
explain analyze
select
c.sourceId,
c.targetId,
abs(c.tr <http://c.tr> - c.sr <http://c.sr>) as xmy,
(c.sr <http://c.sr> - s.ar <http://s.ar>) * (c.tr <http://c.tr> -
t.ar<http://t.ar>)
as xy,
(c.sr <http://c.sr> - s.ar <http://s.ar>) * (c.sr <http://c.sr> -
s.ar<http://s.ar>)
as x2,
(c.tr <http://c.tr> - t.ar <http://t.ar>) * (c.tr <http://c.tr> -
t.ar<http://t.ar>)
as y2
from
candidates617004 c,
lte_user s,
lte_user t
where
c.sourceId = s.user_id
and c.targetId = t.user_id
order by
c.sourceId,
c.targetId;

QUERY PLAN
Sort (cost=13430.57..13439.24 rows=3467 width=48) (actual time=
1390.000..1390.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Merge Join (cost=9912.07..13226.72 rows=3467 width=48) (actual time=
1344.000..1375.000 rows=3467 loops=1)
Merge Cond: ("outer".user_id = "inner".sourceid)
-> Index Scan using lte_user_pkey on lte_user s
(cost=0.00..16837.71rows=279395 width=16) (actual time=
0.000..95.000 rows=50034 loops=1)
-> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual time=
1156.000..1156.000 rows=3467 loops=1)
Sort Key: c.sourceid
-> Hash Join (cost=8710.44..9708.21 rows=3467 width=40) (actual time=
1125.000..1156.000 rows=3467 loops=1)
Hash Cond: ("outer".targetid = "inner".user_id)
-> Seq Scan on candidates617004 c (cost=0.00..67.67 rows=3467 width=32)
(actual time=0.000..0.000 rows=3467 loops=1)
-> Hash (cost=8011.95..8011.95 rows=279395 width=16) (actual time=
1125.000..1125.000 rows=0 loops=1)
-> Seq Scan on lte_user t (cost=0.00..8011.95 rows=279395 width=16) (actual
time=0.000..670.000 rows=279395 loops=1)
Total runtime: 1406.000 ms

enable_hashjoin disabled
----------------------------------------
QUERY PLAN
Sort (cost=14355.37..14364.03 rows=3467 width=48) (actual time=
391.000..391.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=271.52..14151.51 rows=3467 width=48) (actual time=
203.000..359.000 rows=3467 loops=1)
-> Merge Join (cost=271.52..3490.83 rows=3467 width=40) (actual time=
203.000..218.000 rows=3467 loops=1)
Merge Cond: ("outer".user_id = "inner".sourceid)
-> Index Scan using lte_user_pkey on lte_user s
(cost=0.00..16837.71rows=279395 width=16) (actual time=
0.000..126.000 rows=50034 loops=1)
-> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
time=15.000..30.000rows=3467 loops=1)
Sort Key: c.sourceid
-> Seq Scan on candidates617004 c (cost=0.00..67.67 rows=3467 width=32)
(actual time=0.000..0.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.03 rows=1
width=16) (actual time=0.031..0.036 rows=1 loops=3467)
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 406.000 ms

random_page_cost set to 1.5
----------------------------------------------
QUERY PLAN
Sort (cost=12702.62..12711.29 rows=3467 width=48) (actual time=
1407.000..1407.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Merge Join (cost=9912.07..12498.77 rows=3467 width=48) (actual time=
1391.000..1407.000 rows=3467 loops=1)
Merge Cond: ("outer".user_id = "inner".sourceid)
-> Index Scan using lte_user_pkey on lte_user s
(cost=0.00..12807.34rows=279395 width=16) (actual time=
0.000..46.000 rows=50034 loops=1)
-> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual time=
1188.000..1188.000 rows=3467 loops=1)
Sort Key: c.sourceid
-> Hash Join (cost=8710.44..9708.21 rows=3467 width=40) (actual time=
1157.000..1188.000 rows=3467 loops=1)
Hash Cond: ("outer".targetid = "inner".user_id)
-> Seq Scan on candidates617004 c (cost=0.00..67.67 rows=3467 width=32)
(actual time=0.000..15.000 rows=3467 loops=1)
-> Hash (cost=8011.95..8011.95 rows=279395 width=16) (actual time=
1157.000..1157.000 rows=0 loops=1)
-> Seq Scan on lte_user t (cost=0.00..8011.95 rows=279395 width=16) (actual
time=0.000..750.000 rows=279395 loops=1)
Total runtime: 1422.000 ms

random_page_cost set to 1.5 and enable_hashjoin set to false
--------------------------------------------------------------------------------------------------
QUERY PLAN
Sort (cost=13565.58..13574.25 rows=3467 width=48) (actual time=
390.000..390.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=271.52..13361.73 rows=3467 width=48) (actual time=
203.000..360.000 rows=3467 loops=1)
-> Merge Join (cost=271.52..2762.88 rows=3467 width=40) (actual time=
203.000..250.000 rows=3467 loops=1)
Merge Cond: ("outer".user_id = "inner".sourceid)
-> Index Scan using lte_user_pkey on lte_user s
(cost=0.00..12807.34rows=279395 width=16) (actual time=
0.000..48.000 rows=50034 loops=1)
-> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
time=15.000..31.000rows=3467 loops=1)
Sort Key: c.sourceid
-> Seq Scan on candidates617004 c (cost=0.00..67.67 rows=3467 width=32)
(actual time=0.000..15.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.02 rows=1
width=16) (actual time=0.023..0.023 rows=1 loops=3467)
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 406.000 ms

Thanks,
Meetesh

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tobias Brox 2005-08-01 22:37:08 Re: Planner incorrectly choosing seq scan over index scan
Previous Message Michael Parker 2005-08-01 22:09:07 Re: Performance problems testing with Spamassassin 3.1.0