Re: [repost] partial index / funxtional idx or bad sql?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jim(at)nasby(dot)net
Cc: csajl <csajl(at)yahoo(dot)com>, Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: [repost] partial index / funxtional idx or bad sql?
Date: 2003-05-13 14:46:12
Message-ID: 8275.1052837172@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> Wow, I'll have to keep that in mind. Shouldn't the optimizer be able to
> handle that? Could this get added to the TODO?

No, 'cause it's done (in CVS tip).

I'm actually a bit hesitant now to recommend that people do such things,
because the 7.4 optimizer is likely to produce a better plan from the
unmodified IN query than it will from any explicitly "improved" version.
The 7.4 code knows several ways to do IN efficiently, but when you
hand-transform the query you are forcing the choice; perhaps wrongly.

An example from CVS tip and the regression database in which hand
transformation forces a less efficient plan choice:

regression=# explain analyze select * from tenk1 a where unique1 in (select ten from tenk1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=483.17..484.91 rows=10 width=248) (actual time=407.14..409.16 rows=10 loops=1)
Merge Cond: ("outer".unique1 = "inner".ten)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1571.97 rows=10000 width=244) (actual time=0.41..1.60 rows=11 loops=1)
-> Sort (cost=483.17..483.19 rows=10 width=4) (actual time=406.57..406.65 rows=10 loops=1)
Sort Key: tenk1.ten
-> HashAggregate (cost=483.00..483.00 rows=10 width=4) (actual time=406.08..406.26 rows=10 loops=1)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=4) (actual time=0.19..261.84 rows=10000 loops=1)
Total runtime: 410.74 msec
(8 rows)

regression=# explain analyze select * from tenk1 a, (select distinct ten from tenk1) b
regression-# where a.unique1 = b.ten;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1122.39..1232.59 rows=10 width=248) (actual time=476.67..666.02 rows=10 loops=1)
-> Subquery Scan b (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.94..662.00 rows=10 loops=1)
-> Unique (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.89..661.65 rows=10 loops=1)
-> Sort (cost=1122.39..1147.39 rows=10000 width=4) (actual time=475.85..559.27 rows=10000 loops=1)
Sort Key: ten
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=4) (actual time=0.37..274.87 rows=10000 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..6.01 rows=1 width=244) (actual time=0.27..0.31 rows=1 loops=10)
Index Cond: (a.unique1 = "outer".ten)
Total runtime: 687.53 msec
(9 rows)

So, for now, make the transformation ... but keep a note about the IN
version to try whenever you update to 7.4.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-05-13 16:15:34 Re: PERFORMANCE and SIZE
Previous Message Jim C. Nasby 2003-05-13 10:58:17 Re: [repost] partial index / funxtional idx or bad sql?