Quick Links

Re: slow query

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Ryan Bradetich <rbradetich(at)uswest(dot)net>, Oleg Lebedev <oleg(dot)lebedev(at)waterford(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: slow query
Date:	2003-02-24 02:28:54
Message-ID:	21154.1046053734@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I am assuming you said this because EXISTS is faster for > 12 rows?

> That's my rule of thumb, *NOT* any kind of relational-calculus-based truth.

Keep in mind also that the tradeoffs will change quite a lot when PG 7.4
hits the streets, because the optimizer has gotten a lot smarter about
how to handle IN, but no smarter about EXISTS. Here's one rather silly
example using CVS tip:

regression=# explain analyze select * from tenk1 a where
regression-# unique1 in (select hundred from tenk1 b);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=486.32..504.11 rows=100 width=248) (actual time=453.19..468.86 rows=100 loops=1)
Merge Cond: ("outer".unique1 = "inner".hundred)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1571.87 rows=10000 width=244) (actual time=0.12..5.25 rows=101 loops=1)
-> Sort (cost=486.32..486.57 rows=100 width=4) (actual time=452.91..453.83 rows=100 loops=1)
Sort Key: b.hundred
-> HashAggregate (cost=483.00..483.00 rows=100 width=4) (actual time=447.59..449.80 rows=100 loops=1)
-> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=4) (actual time=0.06..276.47 rows=10000 loops=1)
Total runtime: 472.06 msec
(8 rows)

regression=# explain analyze select * from tenk1 a where
regression-# exists (select 1 from tenk1 b where b.hundred = a.unique1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tenk1 a (cost=0.00..35889.66 rows=5000 width=244) (actual time=3.69..1591.78 rows=100 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..354.32 rows=100 width=0) (actual time=0.10..0.10 rows=0 loops=10000)
Index Cond: (hundred = $0)
Total runtime: 1593.88 msec
(6 rows)

The EXISTS case takes about the same time in 7.3, but the IN case is off
the charts (I got bored of waiting after 25 minutes...)

regards, tom lane

In response to

Re: slow query at 2003-02-23 21:47:58 from Josh Berkus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Schaefer, Mario	2003-02-24 09:52:55	partitioning os swap data log tempdb
Previous Message	Josh Berkus	2003-02-23 21:47:58	Re: slow query