Re: SQL 'in' vs join.

From: mlw <markw(at)mohawksoft(dot)com>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL 'in' vs join.
Date: 2000-11-30 13:37:42
Message-ID: 3A265826.B7AF2DBE@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing wrote:
>
> mlw wrote:
> >
> > Why is a "select * from table1 where field in (select field from table2
> > where condition )"
> >
> > is so dramatically bad compared to:
> >
> > "select * from table1, table2 where table1.field = table2.field and
> > condition"
> >
> > I can't understand why the first query isn't optimized better than the
> > second one. The 'in' query forces a full table scan (it shouldn't) and
> > the second one uses the indexes. Does anyone know why?
>
> Its not done yet, and probably hsomewhat difficult to do in a general
> fashion
>
> > I know I am no SQL guru, but my gut tells me that the 'in' operator
> > should be far more efficient than a join.
> >
> > Here are the actual queries:
> >
> > cdinfo=# explain select trackid from zsong where muzenbr in (select
> > muzenbr from ztitles where title = 'Mulan') ;
>
> try
>
> explain
> select trackid
> from zsong
> where muzenbr in (
> select muzenbr
> from ztitles
> where title = 'Mulan'
> and ztitles.muzenbr=zsong.muzenbr
> );
>
> this should hint the current optimizer to do the right thing;
>
> -----------------
> Hannu

Nope:

cdinfo=# explain
cdinfo-# select trackid
cdinfo-# from zsong
cdinfo-# where muzenbr in (
cdinfo(# select muzenbr
cdinfo(# from ztitles
cdinfo(# where title = 'Mulan'
cdinfo(# and ztitles.muzenbr=zsong.muzenbr
cdinfo(# );
NOTICE: QUERY PLAN:

Seq Scan on zsong (cost=100000000.00..104474515.18 rows=2193213
width=4)
SubPlan
-> Index Scan using ztitles_pkey on ztitles (cost=0.00..4.05
rows=1 width=4)

But what I also find odd is, look at the components:

cdinfo=# explain select muzenbr from ztitles where title = 'Mulan' ;
NOTICE: QUERY PLAN:

Index Scan using ztitles_title_ndx on ztitles (cost=0.00..7.08 rows=1
width=4)

cdinfo=# explain select trackid from zsong where muzenbr in ( 1,2,3,4,5)
;
NOTICE: QUERY PLAN:

Index Scan using zsong_muzenbr_ndx, zsong_muzenbr_ndx,
zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx on zsong
(cost=0.00..392.66 rows=102 width=4)

Now, given the two components, each with very low costs, it chooses to
do a sequential scan on the table. I don't get it. I have have been
having no end of problems with Postgres' optimizer. It just seems to be
brain dead at times. It is a huge point of frustration to me. I am tied
to postgres in my current project, and I fear that I will not be able to
implement certain features because of this sort of behavior.

--
http://www.mohawksoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Fork 2000-11-30 13:38:32 Re:
Previous Message Arno A. Karner 2000-11-30 13:04:34 compiling pg 7.0.3 on sco 5.0.5