Re: in() VS exists()

From: Julien <jcigar(at)ulb(dot)ac(dot)be>
To: Bart Degryse <Bart(dot)Degryse(at)indicator(dot)be>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: in() VS exists()
Date: 2008-03-13 16:10:00
Message-ID: 1205424600.3240.30.camel@frodon.be-bif.ulb.ac.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

If I understood well the query plan, the planner optimize the
IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) :

-> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual
time=93.971..201.908 rows=1431 loops=1)
Hash Cond: ("outer".id = "inner".specimen_id)

so I guess that :

select count(sp.id) from specimens sp where sp.id in (select specimen_id
from sequences);

is almost the same as :

select count(sp.id) from specimens sp INNER JOIN (select specimen_id
from sequences GROUP BY specimen_id) as foo on foo.specimen_id = sp.id;

?

Thanks,
Julien

On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote:
> The chapter on indexes in the manual should give you a pretty good
> idea on the why.
> IN and EXISTS are not the only possibilities, you can also use inner
> or outer joins.
> Which solution performs best depends on the data, the database
> version, the available indexes, ...
>
> >>> Julien <jcigar(at)ulb(dot)ac(dot)be> 2008-03-13 15:47 >>>
> Hello,
>
> Does anyone has an idea why sometimes:
> - select ... where ... in (select ...)
> is faster than :
> - select ... where ... exists(select ...)
> and sometimes it's the opposite ?
>
> I had such a situation, I've pasted the queries on:
> http://rafb.net/p/KXNZ6892.html and http://rafb.net/p/jvo5DO38.html
>
> It's running PostgreSQL 8.1 with an effective_cache_size of 30000.
>
> specimens.id is the primary key and there are indexes on
> sequences(specimen_id) and specimen_measurements(specimen_id)
>
> Is there a general "rule" to know when to use the in() version and
> when
> to use the exists() version ? Is it true to say that the exists()
> version is more scalable (with many rows) than the in() version (from
> the little tests I made it seems the case) ?
>
> Thanks,
> Julien
>
> --
> Julien Cigar
> Belgian Biodiversity Platform
> http://www.biodiversity.be
> Université Libre de Bruxelles (ULB)
> Campus de la Plaine CP 257
> Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> Boulevard du Triomphe, entrée ULB 2
> B-1050 Bruxelles
> Mail: jcigar(at)ulb(dot)ac(dot)be
> @biobel: http://biobel.biodiversity.be/person/show/471
> Tel : 02 650 57 52
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
--
Julien Cigar
Belgian Biodiversity Platform
http://www.biodiversity.be
Université Libre de Bruxelles (ULB)
Campus de la Plaine CP 257
Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
Boulevard du Triomphe, entrée ULB 2
B-1050 Bruxelles
Mail: jcigar(at)ulb(dot)ac(dot)be
@biobel: http://biobel.biodiversity.be/person/show/471
Tel : 02 650 57 52

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Julien 2008-03-13 16:27:43 Re: in() VS exists()
Previous Message Bart Degryse 2008-03-13 14:50:06 Re: in() VS exists()