Re: Planner incorrectly choosing seq scan over index scan

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Arbash Meinel <john(at)arbash-meinel(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner incorrectly choosing seq scan over index scan
Date: 2005-08-03 12:48:38
Message-ID: fc5b04ca05080305483430d0ef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Btw - I tried playing around with some of the other planner cost constants
but I wasn't able to get the planner to choose the index scan. It seems like
the issue is that the estimated cost for fetching one row from the index (
3.02) is a little high in my case. Is there any way that I can adjust that
cost estimate? Are there any side effects of doing that? Or is my best
solution to simple set enable_hashjoin to off for this query?

Thanks,
Meetesh

On 8/2/05, Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com> wrote:
>
> Thanks Tom,
>
> That modifies the query plan slightly, but the planner still decides to do
> a hash join for the lte_user table aliased 't'. Though, if I make this
> change and set enable_hashjoin to off, the query plan (and execution time)
> gets even better.
>
> enable_hashjoin = on
> ----------------------------------
> QUERY PLAN
> Sort (cost=10113.35..10122.02 rows=3467 width=48) (actual time=
> 1203.000..1203.000 rows=3467 loops=1)
> Sort Key: c.sourceid, c.targetid
> -> Nested Loop (cost=8711.19..9909.50 rows=3467 width=48) (actual time=
> 1156.000..1203.000 rows=3467 loops=1)
> -> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1
> width=16) (actual time=0.000..0.000 rows=1 loops=1)
> Index Cond: (617004 = user_id)
> -> Hash Join (cost=8711.19..9776.46 rows=3467 width=40) (actual time=
> 1156.000..1187.000 rows=3467 loops=1)
> Hash Cond: ("outer".targetid = "inner".user_id)
> -> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32)
> (actual time=0.000..16.000 rows=3467 loops=1)
> Filter: (sourceid = 617004)
> -> Hash (cost=8012.55..8012.55 rows=279455 width=16) (actual time=
> 1141.000..1141.000 rows=0 loops=1)
> -> Seq Scan on lte_user t (cost=0.00..8012.55 rows=279455 width=16)
> (actual time=0.000..720.000 rows=279395 loops=1)
> Total runtime: 1218.000 ms
>
> enable_hashjoin = off
> -----------------------------------
> QUERY PLAN
> Sort (cost=10942.56..10951.22 rows=3467 width=48) (actual time=
> 188.000..188.000 rows=3467 loops=1)
> Sort Key: c.sourceid, c.targetid
> -> Nested Loop (cost=0.00..10738.71 rows=3467 width=48) (actual time=
> 0.000..188.000 rows=3467 loops=1)
> -> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1
> width=16) (actual time=0.000..0.000 rows=1 loops=1)
> Index Cond: (617004 = user_id)
> -> Nested Loop (cost=0.00..10605.67 rows=3467 width=40) (actual time=
> 0.000..157.000 rows=3467 loops=1)
> -> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32)
> (actual time=0.000..15.000 rows=3467 loops=1)
> Filter: (sourceid = 617004)
> -> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.02 rows=1
> width=16) (actual time=0.028..0.037 rows=1 loops=3467)
> Index Cond: ("outer".targetid = t.user_id)
> Total runtime: 188.000 ms
>
> Thanks,
> Meetesh
>
> On 8/2/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com> writes:
> > > Sure. The lte_user table is just a collection of users. user_id is
> > assigned=
> > > uniquely using a sequence. During some processing, we create a
> > candidates=
> > > table (candidates617004 in our case). This table is usually a temp
> > table.=
> > > sourceid is a user_id (in this case it is always 617004) and targetid
> > is=20
> > > also a user_id (2860 distinct values out of 3467). The rest of the=20
> > > information is either only used in the select clause or not used at
> > all=20
> > > during this processing.
> >
> > If you know that sourceid has only a single value, it'd probably be
> > helpful to call out that value in the query, ie,
> > where ... AND c.sourceId = 617004 ...
> >
> > regards, tom lane
> >
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Lane Van Ingen 2005-08-03 13:15:34 Is There A Windows Version of Performance Tuning Documents?
Previous Message Richard Huxton 2005-08-03 08:53:29 Re: "nice"/low priority Query