Quick Links

Re: Can Postgres use an INDEX over an OR?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
Cc:	Chris <dmagick(at)gmail(dot)com>, Robert James <srobertjames(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Can Postgres use an INDEX over an OR?
Date:	2009-07-27 10:53:06
Message-ID:	603c8f070907270353i7669911cq1796ce2de49be744@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

2009/7/20 Віталій Тимчишин <tivv00(at)gmail(dot)com>:
> 20 липня 2009 р. 11:02 Chris <dmagick(at)gmail(dot)com> написав:
>>
>> Віталій Тимчишин wrote:
>>>
>>>
>>> 2009/7/20 Robert James <srobertjames(at)gmail(dot)com
>>> <mailto:srobertjames(at)gmail(dot)com>>
>>>
>>>
>>> Hi. I notice that when I do a WHERE x, Postgres uses an index, and
>>> when I do WHERE y, it does so as well, but when I do WHERE x OR y,
>>> it doesn't. Why is this so?
>>>
>>> It's not clever enough.
>>
>> Of course it is.
>
> For simple cases
>
>>
>>
>> I'm running 8.3.7.
>>
>> create table t1(id int primary key);
>> insert into t1(id) select a from generate_series(1, 500000) as s(a);
>> analyze t1;
>
> explain analyze select * from t1 where
> id < 10000
>
> "Index Scan using t1_pkey on t1 (cost=0.00..322.51 rows=9612 width=4)
> (actual time=0.030..3.700 rows=9999 loops=1)"
> " Index Cond: (id < 10000)"
> "Total runtime: 4.835 ms"
>
> explain analyze select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
>
> "Nested Loop (cost=32.50..1341.49 rows=200 width=4) (actual
> time=15.353..67.014 rows=11 loops=1)"
> " -> HashAggregate (cost=32.50..34.50 rows=200 width=4) (actual
> time=0.028..0.043 rows=11 loops=1)"
> "        -> Function Scan on generate_series (cost=0.00..20.00 rows=1000
> width=0) (actual time=0.014..0.020 rows=11 loops=1)"
> " -> Index Scan using t1_pkey on t1 (cost=0.00..6.52 rows=1 width=4)
> (actual time=6.083..6.084 rows=1 loops=11)"
> "        Index Cond: (t1.id = (((random() * 500000::double
> precision))::integer))"
> "Total runtime: 67.070 ms"
>
> explain analyze select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
> or
> id < 10000
>
> "Seq Scan on t1 (cost=22.50..9735.50 rows=254806 width=4) (actual
> time=0.049..148.947 rows=10010 loops=1)"
> " Filter: ((hashed subplan) OR (id < 10000))"
> " SubPlan"
> "    -> Function Scan on generate_series (cost=0.00..20.00 rows=1000
> width=0) (actual time=0.014..0.019 rows=11 loops=1)"
> "Total runtime: 150.123 ms"
>
> explain analyze
> select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
> union
> select * from t1 where
> id < 10000
>
> "Unique (cost=2412.68..2461.74 rows=9812 width=4) (actual
> time=89.190..95.014 rows=10010 loops=1)"
> " -> Sort (cost=2412.68..2437.21 rows=9812 width=4) (actual
> time=89.189..91.167 rows=10010 loops=1)"
> "        Sort Key: public.t1.id"
> "        Sort Method: quicksort Memory: 854kB"
> "        -> Append (cost=32.50..1762.13 rows=9812 width=4) (actual
> time=16.641..76.338 rows=10010 loops=1)"
> "              -> Nested Loop (cost=32.50..1341.49 rows=200 width=4)
> (actual time=16.641..70.051 rows=11 loops=1)"
> "                    -> HashAggregate (cost=32.50..34.50 rows=200 width=4)
> (actual time=0.033..0.049 rows=11 loops=1)"
> "                          -> Function Scan on generate_series
> (cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
> loops=1)"
> "                    -> Index Scan using t1_pkey on t1 (cost=0.00..6.52
> rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)"
> "                          Index Cond: (public.t1.id = (((random() *
> 500000::double precision))::integer))"
> "              -> Index Scan using t1_pkey on t1 (cost=0.00..322.51
> rows=9612 width=4) (actual time=0.023..4.075 rows=9999 loops=1)"
> "                    Index Cond: (id < 10000)"
> "Total runtime: 112.694 ms"

Hmm. What you're suggesting here is that we could consider
implementing OR conditions by rescanning the inner side for each index
qual and then unique-ifying the results on the index column. That's
probably possible, but it doesn't sound easy, especially since our
selectivity-estimation code for OR conditions is not very good, so we
might choose to do it this way when that's not actually the best plan.

...Robert

In response to

Re: Can Postgres use an INDEX over an OR? at 2009-07-20 15:05:19 from Віталій Тимчишин

Responses

Re: Can Postgres use an INDEX over an OR? at 2009-07-27 11:37:14 from Віталій Тимчишин

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Віталій Тимчишин	2009-07-27 11:37:14	Re: Can Postgres use an INDEX over an OR?
Previous Message	Eric Comeau	2009-07-27 10:43:07	Re: Very big insert/join performance problem (bacula)