Re: [SQL] NOT IN clause performing badly

From: Herouth Maoz <herouth(at)oumail(dot)openu(dot)ac(dot)il>
To: Christian Rudow <Christian(dot)Rudow(at)thinx(dot)ch>, PostgreSQL SQL <pgsql-sql(at)postgreSQL(dot)org>
Subject: Re: [SQL] NOT IN clause performing badly
Date: 1999-07-26 11:12:02
Message-ID: l03130303b3c1edc28fd2@[147.233.159.109]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

At 12:09 +0300 on 26/07/1999, Christian Rudow wrote:

> Questions
> ---------
> Is this a known problem ?
> What is causing the problem ?
> Is it a problem of RDBMS parameters ?
> Is there an alternative to the NOT IN clause ?

NOT IN is an inefficient way of running such a query. I would suggest:

- After creating tempx1, create an index on its id field
- Use the following instead of NOT IN:

select id from mytable
where NOT EXISTS (
select * from tempx1
where tempx1.id = mytable.id
);

What is the difference?

In the NOT IN version, for each of the 10000 ids in mytable, it has to
search through 8000 records to see whether the current mytable.id is there
or not. This would give you 80000000 operations. Especially since there is
no index on tempx1.

The EXISTS query I gave uses the index on tempx1.id for each of the 10000
ids. Thus, you have 10000 * log 8000, more or less.

But you have to count in the penalty of creating the index on tempx1.id in
the first place.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message John Reynolds 1999-07-26 11:23:39 Re: [SQL] RE: Tool for generating entity relation diagrams.
Previous Message D'Arcy J.M. Cain 1999-07-26 10:30:57 Re: [SQL] database design SQL prob.