Re: Bitmap scan is undercosted?

From: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2018-02-24 08:45:14
Message-ID: ebcb29c6-d9ce-ee44-a1fa-a89792ebf02a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Sorry for delay with response, I had to switch to other tasks and didn't
have time to run proper tests and write some meaningful response.

Recently,  a similar issue happened with another our database, so I
decided to write an update.

Bitmap scan was preferred to index scan by the planner, but bitmap scan
was running worse in practice. Here are the relevant pieces of a much
bigger query plan:

 ->  Bitmap Heap Scan on cmdb_program_daily_usage
cmdb_program_daily_usage_6  (cost=6707.08..6879.35 rows=32 width=20)
(actual time=39.994..40.019 rows=12 loops=336)
       Recheck Cond: ((used_from = cmdb_ci_computer_12.id) AND
(usage_date >= '2018-02-02'::date) AND (usage_date <= '2018-02-12'::date))
       Filter: (((NOT thin_client) OR (thin_client IS NULL)) AND
(program_instance IS NOT NULL) AND (minutes_in_use > 0))
       Rows Removed by Filter: 69
       Heap Blocks: exact=2995
       Buffers: shared hit=563448
       ->  BitmapAnd  (cost=6707.08..6707.08 rows=154 width=0) (actual
time=39.978..39.978 rows=0 loops=336)
             Buffers: shared hit=560453
             ->  Bitmap Index Scan on idx_fk_5317241949468942 
(cost=0.00..133.87 rows=12641 width=0) (actual time=0.373..0.373
rows=4780 loops=336)
                   Index Cond: (used_from = cmdb_ci_computer_12.id)
                   Buffers: shared hit=5765
             ->  Bitmap Index Scan on idx_263911642415136 
(cost=0.00..6572.94 rows=504668 width=0) (actual time=40.873..40.873
rows=540327 loops=324)
                   Index Cond: ((usage_date >= '2018-02-02'::date) AND
(usage_date <= '2018-02-12'::date))
                   Buffers: shared hit=554688

 ->  Index Scan using idx_fk_5317241949468942 on
cmdb_program_daily_usage cmdb_program_daily_usage_6 (cost=0.56..24322.97
rows=35 width=20) (actual time=1.211..2.196 rows=14 loops=338)
       Index Cond: (used_from = cmdb_ci_computer_12.id)
       Filter: (((NOT thin_client) OR (thin_client IS NULL)) AND
(program_instance IS NOT NULL) AND (minutes_in_use > 0) AND (usage_date
>= '2018-02-02'::date) AND (usage_date <= '2018-02-12'::date))
       Rows Removed by Filter: 4786
       Buffers: shared hit=289812

The difference in run time does not look very huge, but when it's a part
of a loop, that could mean difference between minutes and hours.

After running some tests, here are the conclusions we've made:

- When running with cold cache, and data is being read from disk, then
the planner estimates look adequate. Bitmap scan has better costs, and
indeed it performs better in that case.

- When running with hot cache, and most of data is already in RAM, then
index scan starts to outperform bitmap scan. Unfortunately the planner
cannot account for the cache very well, and can't switch the plan.
Because even if the planner would ever learn to account for the current
content of shared buffers, it still can't know much about the content of
filesystem cache.

- Tests showed that the costs are dominated by random_page_cost, but
there is still potential to change the total plan cost, if "cpu_*" costs
would be less distant from "*_page_cost".

- In our case the data is likely to be in cache, so we decided to change
cost settings: seq_page_cost 1.0 -> 0.5; random_page_cost 1.1 -> 0.6

Regards,
Vitaliy

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-24 12:37:51 Re: check error messages in SSL tests
Previous Message Tom Kazimiers 2018-02-24 03:47:48 Unexpected behavior with transition tables in update statement trigger

Browse pgsql-performance by date

  From Date Subject
Next Message ldh@laurent-hasson.com 2018-02-24 17:19:21 RE: Updating large tables without dead tuples
Previous Message Stephen Frost 2018-02-24 00:09:40 Re: Updating large tables without dead tuples