Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Date: 2005-05-18 05:12:17
Message-ID: 1116393137.17217.24.camel@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote:
> >> This is a fallacy, and I think your concern is largely mistaken. Have
> >> you experimented with the cases you are worried about?
>
> > Perhaps I have not stated the problem clearly. Believe me, I have
> > experimented extensively with this problem.
>
> Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*?
> The current code is certainly capable of choosing either seqscan,
> bitmap scan, or traditional index scan depending on the given query.
> For example,

[...]

> I think the work that's most needed at the moment is to test the
> bitmap-scan cost model to see if it has much to do with reality...

The cost model seems to work pretty darn well, as a matter of fact.

This table is in-order by id, in-order by date, and randomly ordered by
"random".

scratch=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
date | date |
random | integer |
Indexes:
"test_pk" UNIQUE, btree (id)
"test_date_idx" btree (date)
"test_rand_idx" btree (random)

scratch=# select count(1) from test;
count
----------
10000000

The size of the tables and indexes is about 1G, or double physical
memory. I tested by selecting on the random column. For 100 random
values, I selected the row that matches exactly, then in ranges of 1000,
10000, 100000, and 1000000. These touch roughly 5, 50, 500, and 5000
tuples, respectively. The test is repeated for index scan, bitmap scan,
and sequential scan.

Example query:

select count(1)
from test
where random >= 1429076987
and random < 1429076987 + 10000;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 rows=1 loops=1)
-> Bitmap Heap Scan on test (cost=2.26..171.20 rows=43 width=0) (actual time=0.145..0.829 rows=42 loops=1)
Recheck Cond: ((random >= 1429076987) AND (random < 1429086987))
-> Bitmap Index Scan on test_rand_idx (cost=0.00..2.26 rows=43 width=0) (actual time=0.063..0.063 rows=42 loops=1)
Index Cond: ((random >= 1429076987) AND (random < 1429086987))
Total runtime: 1.114 ms

100 queries returning | 1 | 5 | 50 | 500 | 5000 |
----------------------+-----+-----+------+-------+--------+
bitmap scan | 1s | 2s | 13s | 1m41s | 11m16s |
index scan | 1s | 1s | 12s | 2m6s | 24m19s |
----------------------+-----+-----+------+-------+--------+
sequential scan | 28m20s |

The planner uses index scan for the first two columns, and chooses
bitmap scan for the rest. This would seem to be a reasonable decision,
given the results. The only real problem with the cost model in further
testing is that it doesn't switch to seq scan quickly enough. If bitmap
scan is disabled, the planner will pick index scan even in cases when
sequential scan is 10x faster:

scratch=# set enable_bitmapscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987 and random < 1429076987 + 10000000;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=170142.03..170142.03 rows=1 width=0) (actual time=177419.182..177419.185 rows=1 loops=1)
-> Index Scan using test_rand_idx on test (cost=0.00..170034.11 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1)
Index Cond: ((random >= 1429076987) AND (random < 1439076987))
Total runtime: 177419.302 ms
(4 rows)

scratch=# set enable_indexscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987 and random < 1429076987 + 10000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=204165.55..204165.55 rows=1 width=0) (actual time=12334.042..12334.045 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..204057.62 rows=43167 width=0) (actual time=17.436..12174.150 rows=46764 loops=1)
Filter: ((random >= 1429076987) AND (random < 1439076987))
Total runtime: 12334.156 ms
(4 rows)

Obviously in this case sequential scan was (would have been) a huge win.
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.

-jwb

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-05-18 05:12:26 Re: Cost of XLogInsert CRC calculations
Previous Message Greg Stark 2005-05-18 05:04:41 Re: Cost of XLogInsert CRC calculations