Subquery WHERE IN or WHERE EXISTS faster?

From: Ulrich <ulrich(dot)mierendorff(at)gmx(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Subquery WHERE IN or WHERE EXISTS faster?
Date: 2008-06-28 15:22:41
Message-ID: 48665741.2060900@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
I am new to SQL and have two tables..., "processor" and
"users_processors". The first table contains Processors:

CREATE TABLE processor (
id SERIAL,
speed varchar(50) NOT NULL,
type int2 NOT NULL,
PRIMARY KEY (id)
);
CREATE UNIQUE INDEX processor_speed_index ON processors(lower(speed));

Example:
1 "100MHz" 0
2 "36GHz" 7
...

The second Table defines which processor one user has got:

CREATE TABLE users_processors (
userid int REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
processorid int REFERENCES processors ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY(userid, processorid)
);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);

Example:
1 2
1 3
1 4
...
2 1
2 2
...
(The user "1" own processors 2,3,4 and the user 2 owns processors 1,2)

__________________________________________________________

Now, I would like to list all processors user "1" has got. The following
query does that:
SELECT speed FROM processors WHERE id IN (SELECT processorid FROM
users_processors WHERE userid=1) ORDER BY speed ASC LIMIT 10 OFFSET 2;

This would return 10 processors beginning with number 3. I have read,
that this query is slow and can be faster. I analyzed it:
Limit (cost=22.90..22.90 rows=1 width=118) (actual time=0.344..0.349
rows=9 loops=1)
-> Sort (cost=22.90..22.90 rows=2 width=118) (actual
time=0.341..0.341 rows=11 loops=1)
Sort Key: processors.speed
Sort Method: quicksort Memory: 17kB
-> Nested Loop (cost=15.03..22.89 rows=2 width=118) (actual
time=0.225..0.289 rows=11 loops=1)
-> HashAggregate (cost=15.03..15.05 rows=2 width=4)
(actual time=0.207..0.214 rows=11 loops=1)
-> Bitmap Heap Scan on users_processors
(cost=4.34..15.01 rows=11 width=4) (actual time=0.175..0.179 rows=11
loops=1)
Recheck Cond: (userid = 1)
-> Bitmap Index Scan on
users_processors_userid_index (cost=0.00..4.33 rows=11 width=0) (actual
time=0.159..0.159 rows=12 loops=1)
Index Cond: (userid = 1)
-> Index Scan using processors_pkey on processors
(cost=0.00..3.90 rows=1 width=122) (actual time=0.004..0.004 rows=1
loops=11)
Index Cond: (processors.id =
users_processors.processorid)
Total runtime: 0.478 ms
(13 rows)

__________________________________________________________

People say that this query is faster:
SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
users_processors WHERE userid=1 AND processorid=processors.id) ORDER BY
speed ASC LIMIT 10 OFFSET 2;

Analyze returns:
Limit (cost=4404.52..4404.55 rows=10 width=118) (actual
time=0.179..0.184 rows=9 loops=1)
-> Sort (cost=4404.52..4405.18 rows=265 width=118) (actual
time=0.176..0.177 rows=11 loops=1)
Sort Key: processors.speed
Sort Method: quicksort Memory: 17kB
-> Seq Scan on processors (cost=0.00..4398.44 rows=265
width=118) (actual time=0.056..0.118 rows=11 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using users_processors_pkey on
users_processors (cost=0.00..8.27 rows=1 width=0) (actual
time=0.006..0.006 rows=1 loops=11)
Index Cond: ((userid = 1) AND (processorid = $0))
Total runtime: 0.267 ms
(10 rows)

The second query is faster, but I have only used a very small table with
less than 20 items. In real-world I will have tables with thousands of
entries. I wonder if the second query is also faster in cases where I
have big tables, because it does a "Seq Scan", for me this looks like a
complete table scan. This seams reasonable if we look at the query I do
not expect that it is possible to use an INDEX for the second query. So,
is it slower?

Which query would you use, the first or the second one?

I would also like to know the total number of processors one user has
got. I would use one of those queries and replace the "SELECT speed"
with "SELECT count(*)" and remove the LIMIT and OFFSET. Is this good? I
have read that count(*) is slow.

Kind regards
Ulrich

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-06-28 15:53:43 Re: Subquery WHERE IN or WHERE EXISTS faster?
Previous Message Chris Browne 2008-06-27 18:16:57 Re: Federated Postgresql architecture ?