Skip site navigation (1) Skip section navigation (2)

Re: SQL 'in' vs join.

From: mlw <markw(at)mohawksoft(dot)com>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL 'in' vs join.
Date: 2000-11-30 13:37:42
Message-ID: 3A265826.B7AF2DBE@mohawksoft.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hannu Krosing wrote:
> 
> mlw wrote:
> >
> > Why is a "select * from table1 where field in (select field from table2
> > where condition )"
> >
> > is so dramatically bad compared to:
> >
> > "select * from table1, table2 where table1.field = table2.field and
> > condition"
> >
> > I can't understand why the first query isn't optimized better than the
> > second one. The 'in' query forces a full table scan (it shouldn't) and
> > the second one uses the indexes. Does anyone know why?
> 
> Its not done yet, and probably hsomewhat difficult to do in a general
> fashion
> 
> > I know I am no SQL guru, but my gut tells me that the 'in' operator
> > should be far more efficient than a join.
> >
> > Here are the actual queries:
> >
> > cdinfo=# explain select trackid from zsong where muzenbr in (select
> > muzenbr from ztitles where title = 'Mulan') ;
> 
> try
> 
> explain
>  select trackid
>    from zsong
>   where muzenbr in (
>          select muzenbr
>            from ztitles
>           where title = 'Mulan'
>             and ztitles.muzenbr=zsong.muzenbr
>   );
> 
> this should hint the current optimizer to do the right thing;
> 
> -----------------
> Hannu

Nope:

cdinfo=# explain
cdinfo-#  select trackid
cdinfo-#    from zsong
cdinfo-#   where muzenbr in (
cdinfo(#          select muzenbr
cdinfo(#            from ztitles
cdinfo(#           where title = 'Mulan'
cdinfo(#             and ztitles.muzenbr=zsong.muzenbr
cdinfo(#   );
NOTICE:  QUERY PLAN:
 
Seq Scan on zsong  (cost=100000000.00..104474515.18 rows=2193213
width=4)
  SubPlan
    ->  Index Scan using ztitles_pkey on ztitles  (cost=0.00..4.05
rows=1 width=4)  


But what I also find odd is, look at the components:

cdinfo=# explain select muzenbr from ztitles where title = 'Mulan' ;
NOTICE:  QUERY PLAN:
 
Index Scan using ztitles_title_ndx on ztitles  (cost=0.00..7.08 rows=1
width=4)  

cdinfo=# explain select trackid from zsong where muzenbr in ( 1,2,3,4,5)
;
NOTICE:  QUERY PLAN:
 
Index Scan using zsong_muzenbr_ndx, zsong_muzenbr_ndx,
zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx on zsong 
(cost=0.00..392.66 rows=102 width=4) 


Now, given the two components, each with very low costs, it chooses to
do a sequential scan on the table. I don't get it. I have have been
having no end of problems with Postgres' optimizer. It just seems to be
brain dead at times. It is a huge point of frustration to me. I am tied
to postgres in my current project, and I fear that I will not be able to
implement certain features because of this sort of behavior.


-- 
http://www.mohawksoft.com

In response to

Responses

pgsql-hackers by date

Next:From: Michael ForkDate: 2000-11-30 13:38:32
Subject: Re:
Previous:From: Arno A. KarnerDate: 2000-11-30 13:04:34
Subject: compiling pg 7.0.3 on sco 5.0.5

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group