Re: Can Postgres use an INDEX over an OR?

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Chris <dmagick(at)gmail(dot)com>
Cc: Robert James <srobertjames(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Can Postgres use an INDEX over an OR?
Date: 2009-07-20 15:05:19
Message-ID: 331e40660907200805x1e52bf6exb7ea12f9585cb812@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

20 липня 2009 р. 11:02 Chris <dmagick(at)gmail(dot)com> написав:

> Віталій Тимчишин wrote:
>
>>
>>
>> 2009/7/20 Robert James <srobertjames(at)gmail(dot)com <mailto:
>> srobertjames(at)gmail(dot)com>>
>>
>>
>> Hi. I notice that when I do a WHERE x, Postgres uses an index, and
>> when I do WHERE y, it does so as well, but when I do WHERE x OR y,
>> it doesn't. Why is this so?
>>
>> It's not clever enough.
>>
>
> Of course it is.

For simple cases

>
> I'm running 8.3.7.
>
> create table t1(id int primary key);
> insert into t1(id) select a from generate_series(1, 500000) as s(a);
> analyze t1;
>

explain analyze select * from t1 where
id < 10000

"Index Scan using t1_pkey on t1 (cost=0.00..322.51 rows=9612 width=4)
(actual time=0.030..3.700 rows=9999 loops=1)"
" Index Cond: (id < 10000)"
"Total runtime: 4.835 ms"

explain analyze select * from t1 where
id in (select (random() * 500000)::int4 from generate_series(0,10))

"Nested Loop (cost=32.50..1341.49 rows=200 width=4) (actual
time=15.353..67.014 rows=11 loops=1)"
" -> HashAggregate (cost=32.50..34.50 rows=200 width=4) (actual
time=0.028..0.043 rows=11 loops=1)"
" -> Function Scan on generate_series (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.020 rows=11 loops=1)"
" -> Index Scan using t1_pkey on t1 (cost=0.00..6.52 rows=1 width=4)
(actual time=6.083..6.084 rows=1 loops=11)"
" Index Cond: (t1.id = (((random() * 500000::double
precision))::integer))"
"Total runtime: 67.070 ms"

explain analyze select * from t1 where
id in (select (random() * 500000)::int4 from generate_series(0,10))
or
id < 10000

"Seq Scan on t1 (cost=22.50..9735.50 rows=254806 width=4) (actual
time=0.049..148.947 rows=10010 loops=1)"
" Filter: ((hashed subplan) OR (id < 10000))"
" SubPlan"
" -> Function Scan on generate_series (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.019 rows=11 loops=1)"
"Total runtime: 150.123 ms"

explain analyze
select * from t1 where
id in (select (random() * 500000)::int4 from generate_series(0,10))
union
select * from t1 where
id < 10000

"Unique (cost=2412.68..2461.74 rows=9812 width=4) (actual
time=89.190..95.014 rows=10010 loops=1)"
" -> Sort (cost=2412.68..2437.21 rows=9812 width=4) (actual
time=89.189..91.167 rows=10010 loops=1)"
" Sort Key: public.t1.id"
" Sort Method: quicksort Memory: 854kB"
" -> Append (cost=32.50..1762.13 rows=9812 width=4) (actual
time=16.641..76.338 rows=10010 loops=1)"
" -> Nested Loop (cost=32.50..1341.49 rows=200 width=4)
(actual time=16.641..70.051 rows=11 loops=1)"
" -> HashAggregate (cost=32.50..34.50 rows=200 width=4)
(actual time=0.033..0.049 rows=11 loops=1)"
" -> Function Scan on generate_series
(cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
loops=1)"
" -> Index Scan using t1_pkey on t1 (cost=0.00..6.52
rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)"
" Index Cond: (public.t1.id = (((random() *
500000::double precision))::integer))"
" -> Index Scan using t1_pkey on t1 (cost=0.00..322.51
rows=9612 width=4) (actual time=0.023..4.075 rows=9999 loops=1)"
" Index Cond: (id < 10000)"
"Total runtime: 112.694 ms"

So, if it founds out anything complex, it sadly falls back to Sequence scan.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2009-07-20 15:52:05 Re: Used computers?
Previous Message Craig James 2009-07-20 14:29:13 Used computers?