Re: optimizing select ... not in (select ...)

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Laurent Martelli <laurent(at)aopsys(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: optimizing select ... not in (select ...)
Date: 2001-08-13 15:16:43
Message-ID: web-101179@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Laurent,

> select distinct on (Pictures.PictureID) * from Pictures where
> Pictures.PictureID not in (select distinct PictureID from Keywords);
>
> and I find it a bit slow. Does anybody have suggestions to run this
> faster ? (I have indexes on PictureID on both Pictures and Keywords)

NOT IN is always slow on all RDBMS that I know, unless the subselect has
a very small (<100) return set. This is because the engine has to
compare each row in the master query against every value returned by NOT
IN, one row at a time.

Try the WHERE NOT EXISTS construction instead:
SELECT * FROM Pictures
WHERE NOT EXISTS ( SELECT pictureID FROM keywords
WHERE keywords.pictureID = Pictures.pictureID );

This uses the DB engine's JOIN functionality and thus runs considerably
faster.

BTW, all those "DISTINCT" in the query example you gave, assuming that
PictureID is the unique index of Pictures, are completely superfluous
and will only slow the query down. Particularly the use of DISTINCT in
a subquery should only be used if the contents of the subquery will be
displayed as part of the result set.

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

Attachment Content-Type Size
unknown_filename text/plain 2 bytes
unknown_filename text/plain 2 bytes
unknown_filename text/plain 2 bytes

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Bruce Momjian 2001-08-13 15:59:48 Re: optimizing select ... not in (select ...)
Previous Message Tomas Berndtsson 2001-08-13 14:55:22 Re: optimizing select ... not in (select ...)