Re: Planner incorrectly choosing seq scan over index scan

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Arbash Meinel <john(at)arbash-meinel(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner incorrectly choosing seq scan over index scan
Date: 2005-08-02 07:05:50
Message-ID: fc5b04ca05080200054a6da4c3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Tom,

That modifies the query plan slightly, but the planner still decides to do a
hash join for the lte_user table aliased 't'. Though, if I make this change
and set enable_hashjoin to off, the query plan (and execution time) gets
even better.

enable_hashjoin = on
----------------------------------
QUERY PLAN
Sort (cost=10113.35..10122.02 rows=3467 width=48) (actual time=
1203.000..1203.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=8711.19..9909.50 rows=3467 width=48) (actual time=
1156.000..1203.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1
width=16) (actual time=0.000..0.000 rows=1 loops=1)
Index Cond: (617004 = user_id)
-> Hash Join (cost=8711.19..9776.46 rows=3467 width=40) (actual time=
1156.000..1187.000 rows=3467 loops=1)
Hash Cond: ("outer".targetid = "inner".user_id)
-> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32)
(actual time=0.000..16.000 rows=3467 loops=1)
Filter: (sourceid = 617004)
-> Hash (cost=8012.55..8012.55 rows=279455 width=16) (actual time=
1141.000..1141.000 rows=0 loops=1)
-> Seq Scan on lte_user t (cost=0.00..8012.55 rows=279455 width=16) (actual
time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms

enable_hashjoin = off
-----------------------------------
QUERY PLAN
Sort (cost=10942.56..10951.22 rows=3467 width=48) (actual time=
188.000..188.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=0.00..10738.71 rows=3467 width=48) (actual time=
0.000..188.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1
width=16) (actual time=0.000..0.000 rows=1 loops=1)
Index Cond: (617004 = user_id)
-> Nested Loop (cost=0.00..10605.67 rows=3467 width=40) (actual time=
0.000..157.000 rows=3467 loops=1)
-> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32)
(actual time=0.000..15.000 rows=3467 loops=1)
Filter: (sourceid = 617004)
-> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.02 rows=1
width=16) (actual time=0.028..0.037 rows=1 loops=3467)
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms

Thanks,
Meetesh

On 8/2/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com> writes:
> > Sure. The lte_user table is just a collection of users. user_id is
> assigned=
> > uniquely using a sequence. During some processing, we create a
> candidates=
> > table (candidates617004 in our case). This table is usually a temp
> table.=
> > sourceid is a user_id (in this case it is always 617004) and targetid
> is=20
> > also a user_id (2860 distinct values out of 3467). The rest of the=20
> > information is either only used in the select clause or not used at
> all=20
> > during this processing.
>
> If you know that sourceid has only a single value, it'd probably be
> helpful to call out that value in the query, ie,
> where ... AND c.sourceId = 617004 ...
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alon Goldshuv 2005-08-02 14:59:43 Re: COPY FROM performance improvements
Previous Message Tom Lane 2005-08-02 00:15:26 Re: Planner incorrectly choosing seq scan over index scan