Re: Question about (probably wrong) index scan cost for conditional indexes

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question about (probably wrong) index scan cost for conditional indexes
Date: 2012-01-23 23:14:58
Message-ID: CAK-MWwTXTCHj2v7DX14sKkwAZ7Pfj+nwpiqHe0LmhGVjby9tZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi.

Seems previous test case not clear demonstrate the problem which i have
stuck with.

Now much better and close to reality test case:

Preparation:
set random_page_cost to 4;
set seq_page_cost to 1;

create table test (id integer primary key, sections integer[], value float);
insert into test select id, ('{'||((random()*10)::integer)||'}')::integer[]
as value, random() as value from generate_series(1,1000000) as g(id);
--generic gist index for array
CREATE INDEX test_sections_gist on test using gist(sections);
--specialized index on value for sections && '{2}'
CREATE INDEX test_value_in2section_key on test(value) where sections &&
'{2}';
analyze test;

Now actual tests:

Good query but cost definitely wrong:
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order
by value limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..539.29 rows=100 width=37) (actual time=0.043..0.499
rows=100 loops=1)
-> Index Scan using test_value_in2section_key on test
(cost=0.00..5392.87 rows=1000 width=37) (actual time=0.040..0.434 rows=100
loops=1)
Total runtime: 0.570 ms

Compare with almost equivalent query:
postgres=# EXPLAIN ANALYZE SELECT * from test order by id limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.43 rows=100 width=37) (actual time=0.057..0.192
rows=100 loops=1)
-> Index Scan using test_pkey on test (cost=0.00..34317.36
rows=1000000 width=37) (actual time=0.054..0.115 rows=100 loops=1)
Total runtime: 0.258 ms

Actual speed almost same but cost differs 100 times.

Now if I increase the limit I start getting slow plans because it switch to
GIST index and bitmap scan (because cost of common index scan too high):

postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order
by value limit 1000;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2941.68..2944.18 rows=1000 width=37) (actual
time=175.301..175.766 rows=1000 loops=1)
-> Sort (cost=2941.68..2944.18 rows=1000 width=37) (actual
time=175.298..175.541 rows=1000 loops=1)
Sort Key: value
Sort Method: top-N heapsort Memory: 127kB
-> Bitmap Heap Scan on test (cost=56.48..2891.85 rows=1000
width=37) (actual time=80.230..132.479 rows=99641 loops=1)
Recheck Cond: (sections && '{2}'::integer[])
-> Bitmap Index Scan on test_sections_gist
(cost=0.00..56.23 rows=1000 width=0) (actual time=78.112..78.112 rows=99641
loops=1)
Index Cond: (sections && '{2}'::integer[])
Total runtime: 175.960 ms
(9 rows)

Even if I drop GIST index i'm still getting wrong plan:
postgres=# drop index test_sections_gist;
DROP INDEX
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order
by value limit 1000;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4489.88..4492.38 rows=1000 width=37) (actual
time=116.637..117.088 rows=1000 loops=1)
-> Sort (cost=4489.88..4492.38 rows=1000 width=37) (actual
time=116.635..116.857 rows=1000 loops=1)
Sort Key: value
Sort Method: top-N heapsort Memory: 127kB
-> Bitmap Heap Scan on test (cost=1604.68..4440.05 rows=1000
width=37) (actual time=22.175..74.556 rows=99641 loops=1)
Recheck Cond: (sections && '{2}'::integer[])
-> Bitmap Index Scan on test_value_in2section_key
(cost=0.00..1604.43 rows=1000 width=0) (actual time=20.248..20.248
rows=99641 loops=1)
Total runtime: 117.261 ms

And only if I completely disable bitmap scan I get good fast plan (but with
exceptional high cost):

postgres=# set enable_bitmapscan to 0;
SET
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order
by value limit 1000;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.047..4.123
rows=1000 loops=1)
-> Index Scan using test_value_in2section_key on test
(cost=0.00..5392.87 rows=1000 width=37) (actual time=0.044..3.552 rows=1000
loops=1)
Total runtime: 4.460 ms

I hope that test case will make my issue more clear.

Regards,
Maksym

On Mon, Jan 23, 2012 at 11:46 AM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
> On Mon, Jan 23, 2012 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
>> > But it seems that index scan cost for very narrow/selective conditional
>> > indexes is greatly overestimated at least in some cases.
>>
>> I realized in connection with
>> http://archives.postgresql.org/pgsql-general/2012-01/msg00459.php
>> that btcostestimate is not correctly estimating numIndexTuples for
>> partial indexes. But it's impossible to tell from this amount of
>> information whether you're seeing an effect of that, or something else.
>> Can you provide a self-contained test case?
>>
>> regards, tom lane
>>
>
> Prorably simpliest test case:
>
> set random_page_cost to 4;
> set seq_page_cost to 1;
> drop table if exists test;
> CREATE TABLE test (id integer primary key, value1 float, value2 float,
> value3 float, value4 float);
> INSERT into test select id,random() as value1,random() as value2, random()
> as value3,random() as value4 from generate_series(1,1000000) as g(id);
> CREATE INDEX test_special_key on test(value1) where value2*2<0.01 and
> value3*2<0.01 and value4*2<0.01;
> ANALYZE test;
>
> postgres=# EXPLAIN ANALYZE select * from test order by id limit 100;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..3.43 rows=100 width=36) (actual time=0.042..0.170
> rows=100 loops=1)
> -> Index Scan using test_pkey on test (cost=0.00..34317.36
> rows=1000000 width=36) (actual time=0.040..0.108 rows=100 loops=1)
> Total runtime: 0.243 ms
> (3 rows)
>
> vs
>
> postgres=# EXPLAIN ANALYZE select * from test where value2*2<0.01 and
> value3*2<0.01 and value4*2<0.01 order by value1 limit 100;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..92.52 rows=100 width=36) (actual time=0.072..0.072
> rows=0 loops=1)
> -> Index Scan using test_special_key on test (cost=0.00..34264.97
> rows=37037 width=36) (actual time=0.070..0.070 rows=0 loops=1)
> Total runtime: 0.113 ms
> (3 rows)
>
> cost difference:
> (cost=0.00..3.43 rows=100 width=36)
> vs
> (cost=0.00..92.52 rows=100 width=36)
>
> An actual speed (and theoretical performance) almost same.
>
> More selective conditions added to conditional index - worse situation
> with wrong costing.
>
>
> Kind Regards,
> Maksym
>
> --
> Maxim Boguk
> Senior Postgresql DBA.
>
> Phone RU: +7 910 405 4718
> Phone AU: +61 45 218 5678
>
> Skype: maxim.boguk
> Jabber: maxim(dot)boguk(at)gmail(dot)com
>
> LinkedIn profile: http://nz.linkedin.com/in/maximboguk
> "If they can send one man to the moon... why can't they send them all?"
>
> МойКруг: http://mboguk.moikrug.ru/
> "People problems are solved with people.
> If people cannot solve the problem, try technology.
> People will then wish they'd listened at the first stage."
>
>
>

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2012-01-24 00:24:50 Incomplete startup packet help needed
Previous Message Adrian Klaver 2012-01-23 17:10:19 Re: update with from