Re: Can Postgres use an INDEX over an OR?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Віталій Тимчишин <tivv00(at)gmail(dot)com>
Cc: Chris <dmagick(at)gmail(dot)com>, Robert James <srobertjames(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Can Postgres use an INDEX over an OR?
Date: 2009-07-27 10:53:06
Message-ID: 603c8f070907270353i7669911cq1796ce2de49be744@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2009/7/20 Віталій Тимчишин <tivv00(at)gmail(dot)com>:
> 20 липня 2009 р. 11:02 Chris <dmagick(at)gmail(dot)com> написав:
>>
>> Віталій Тимчишин wrote:
>>>
>>>
>>> 2009/7/20 Robert James <srobertjames(at)gmail(dot)com
>>> <mailto:srobertjames(at)gmail(dot)com>>
>>>
>>>
>>>    Hi. I notice that when I do a WHERE x, Postgres uses an index, and
>>>    when I do WHERE y, it does so as well, but when I do WHERE x OR y,
>>>    it doesn't. Why is this so?
>>>
>>> It's not clever enough.
>>
>> Of course it is.
>
> For simple cases
>
>>
>>
>> I'm running 8.3.7.
>>
>> create table t1(id int primary key);
>> insert into t1(id) select a from generate_series(1, 500000) as s(a);
>> analyze t1;
>
> explain analyze select * from t1 where
> id < 10000
>
> "Index Scan using t1_pkey on t1  (cost=0.00..322.51 rows=9612 width=4)
> (actual time=0.030..3.700 rows=9999 loops=1)"
> "  Index Cond: (id < 10000)"
> "Total runtime: 4.835 ms"
>
> explain analyze select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
>
> "Nested Loop  (cost=32.50..1341.49 rows=200 width=4) (actual
> time=15.353..67.014 rows=11 loops=1)"
> "  ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual
> time=0.028..0.043 rows=11 loops=1)"
> "        ->  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
> width=0) (actual time=0.014..0.020 rows=11 loops=1)"
> "  ->  Index Scan using t1_pkey on t1  (cost=0.00..6.52 rows=1 width=4)
> (actual time=6.083..6.084 rows=1 loops=11)"
> "        Index Cond: (t1.id = (((random() * 500000::double
> precision))::integer))"
> "Total runtime: 67.070 ms"
>
> explain analyze select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
> or
> id < 10000
>
> "Seq Scan on t1  (cost=22.50..9735.50 rows=254806 width=4) (actual
> time=0.049..148.947 rows=10010 loops=1)"
> "  Filter: ((hashed subplan) OR (id < 10000))"
> "  SubPlan"
> "    ->  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
> width=0) (actual time=0.014..0.019 rows=11 loops=1)"
> "Total runtime: 150.123 ms"
>
> explain analyze
> select * from t1 where
> id in (select (random() * 500000)::int4 from generate_series(0,10))
> union
> select * from t1 where
> id < 10000
>
> "Unique  (cost=2412.68..2461.74 rows=9812 width=4) (actual
> time=89.190..95.014 rows=10010 loops=1)"
> "  ->  Sort  (cost=2412.68..2437.21 rows=9812 width=4) (actual
> time=89.189..91.167 rows=10010 loops=1)"
> "        Sort Key: public.t1.id"
> "        Sort Method:  quicksort  Memory: 854kB"
> "        ->  Append  (cost=32.50..1762.13 rows=9812 width=4) (actual
> time=16.641..76.338 rows=10010 loops=1)"
> "              ->  Nested Loop  (cost=32.50..1341.49 rows=200 width=4)
> (actual time=16.641..70.051 rows=11 loops=1)"
> "                    ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4)
> (actual time=0.033..0.049 rows=11 loops=1)"
> "                          ->  Function Scan on generate_series
> (cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
> loops=1)"
> "                    ->  Index Scan using t1_pkey on t1  (cost=0.00..6.52
> rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)"
> "                          Index Cond: (public.t1.id = (((random() *
> 500000::double precision))::integer))"
> "              ->  Index Scan using t1_pkey on t1  (cost=0.00..322.51
> rows=9612 width=4) (actual time=0.023..4.075 rows=9999 loops=1)"
> "                    Index Cond: (id < 10000)"
> "Total runtime: 112.694 ms"

Hmm. What you're suggesting here is that we could consider
implementing OR conditions by rescanning the inner side for each index
qual and then unique-ifying the results on the index column. That's
probably possible, but it doesn't sound easy, especially since our
selectivity-estimation code for OR conditions is not very good, so we
might choose to do it this way when that's not actually the best plan.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2009-07-27 11:37:14 Re: Can Postgres use an INDEX over an OR?
Previous Message Eric Comeau 2009-07-27 10:43:07 Re: Very big insert/join performance problem (bacula)