Re: IN vs EXISTS equivalence

From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: IN vs EXISTS equivalence
Date: 2008-08-18 09:02:37
Message-ID: 162867790808180202p28e79d42od1e0bedb45dcaf4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

I did some fast test on pagila database.

8.4
postgres=# explain analyze select * from film f where exists (select
film_id from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=117.01..195.42 rows=966 width=390) (actual
time=36.011..43.314 rows=997 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.971 rows=1000 loops=1)
-> Hash (cost=104.94..104.94 rows=966 width=2) (actual
time=35.886..35.886 rows=997 loops=1)
-> HashAggregate (cost=95.28..104.94 rows=966 width=2)
(actual time=32.650..34.139 rows=997 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.081..14.232 rows=5462 loops=1)
Total runtime: 45.373 ms
(7 rows)

8.3
postgres=# explain select * from film f where exists (select film_id
from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------
Seq Scan on film f (cost=0.00..4789.34 rows=500 width=390)
Filter: (subplan)
SubPlan
-> Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2)
Index Cond: ($0 = film_id)
(5 rows)

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=149.90..240.24 rows=34 width=390) (actual
time=25.473..28.169 rows=3 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.898 rows=1000 loops=1)
-> Hash (cost=81.62..81.62 rows=5462 width=2) (actual
time=24.398..24.398 rows=5462 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.035..12.400 rows=5462 loops=1)
Total runtime: 28.866 ms

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=0.00..4789.34 rows=500 width=390) (actual
time=5.874..22.654 rows=3 loops=1)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2) (actual time=0.016..0.016 rows=1
loops=1000)
Index Cond: ($0 = film_id)
Total runtime: 22.835 ms
(6 rows)

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=117.01..195.42 rows=966 width=390) (actual
time=43.151..53.688 rows=997 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.021..6.765 rows=1000 loops=1)
-> Hash (cost=104.94..104.94 rows=966 width=2) (actual
time=43.091..43.091 rows=997 loops=1)
-> HashAggregate (cost=95.28..104.94 rows=966 width=2)
(actual time=34.754..36.275 rows=997 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.024..15.746 rows=5462 loops=1)
Total runtime: 55.291 ms

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop IN Join (cost=0.00..175.25 rows=986 width=390) (actual
time=0.090..14.272 rows=997 loops=1)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.014..1.877 rows=1000 loops=1)
-> Index Scan using idx_fk_film_id on film_actor (cost=0.00..0.54
rows=6 width=2) (actual time=0.007..0.007 rows=1 loops=1000)
Index Cond: (film_actor.film_id = f.film_id)
Total runtime: 15.902 ms
(5 rows)

8.4
postgres=# explain analyze select * from film f where film_id not in
(select film_id from film_actor);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=95.28..162.78 rows=500 width=390) (actual
time=24.324..26.015 rows=3 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462 width=2)
(actual time=0.026..12.074 rows=5462 loops=1)
Total runtime: 26.201 ms
(5 rows)

8.3
postgres=# explain analyze select * from film f where film_id not in
(select film_id from film_actor);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=95.28..162.78 rows=500 width=390) (actual
time=29.713..30.456 rows=3 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462 width=2)
(actual time=0.051..13.649 rows=5462 loops=1)
Total runtime: 30.644 ms
(5 rows)

8.4 is 10% faster than 8.3 on small table < 1000 rows. 8.4 has much
better prediction.

Regards
Pavel Stehule

2008/8/17 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> If you're still interested in testing CVS HEAD's handling of EXISTS,
> I've about finished what I wanted to do with it.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-08-18 09:05:02 Re: Patch: plan invalidation vs stored procedures
Previous Message Dimitri Fontaine 2008-08-18 09:00:57 Re: Patch: plan invalidation vs stored procedures