Re: EXISTS vs IN vs OUTER JOINS

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Tomasz Myrta <jasiek(at)klaster(dot)net>, PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: EXISTS vs IN vs OUTER JOINS
Date: 2002-12-19 17:15:36
Message-ID: web-2291801@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tomasz,

> Few days ago I read, that EXISTS is better than IN, but only if there
> are many records (how many?). I was wondering which one is better and
> when. Did anyone try to compare these queries doing the same work:
>
> - select * from some_table t
> where t.id [not] in (select id from filter);
> -select * from some_table t
> where [not] exists (select * from filter where id=t.id);

The rule I use is: if I expect the sub-select to return more than 12
records 20% or more of the time, use EXISTS. The speed gain for IN on
small lists is not as dramatic as the speed loss for EXISTS on large
lists.

More importantly, the difference between NOT IN and NOT EXISTS can be
as much as 20:1 on large sub-selects, as opposed to IN and EXISTS,
where I have rarely seen a difference of more than 3:1. As I
understand it, this is because NOT EXISTS can use optimized join
algorithms to locate matching rows, whereas NOT IN must compare each
row against every possible matching value in the subselect.

It also makes a difference whether or not the referenced field(s) in
the subselect is indexed. EXISTS will often use an index to compare
the values in the master query to the sub-query. As far as I know, IN
can use an index to retrieve the subquery values, but not to sort or
compare them after they have been retreived into memory.

> -select * from some_table t
> left join filter f using (id)
> where f.id is [not] null;

This will not get you the same result as the above. It will get you
all rows from t+f where a record is present in f and has (or does not
have) a NULL value for f.id. While this type of query works in MS
Access, it will not work in SQL92/99-commpliant databases.

Incidentally, the dramatic differences between IN and EXISTS are not
only a "PostgreSQL Thing". The same rules apply to MS SQL Server and
SQL Anywhere, for the same reasons.

-Josh Berkus

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message jasiek 2002-12-19 17:27:33 Re: EXISTS vs IN vs OUTER JOINS
Previous Message Tomasz Myrta 2002-12-19 10:12:01 EXISTS vs IN vs OUTER JOINS