Re: Bitmap table scan cost per page formula

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Haisheng Yuan <hyuan(at)pivotal(dot)io>
Cc: pgsql-hackers(at)postgresql(dot)org, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Bitmap table scan cost per page formula
Date: 2017-12-20 03:25:00
Message-ID: 20171220032459.GJ18184@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 19, 2017 at 07:55:32PM +0000, Haisheng Yuan wrote:
> Hi hackers,
>
> This is Haisheng Yuan from Greenplum Database.
>
> We had some query in production showing that planner favors seqscan over
> bitmapscan, and the execution of seqscan is 5x slower than using
> bitmapscan, but the cost of bitmapscan is 2x the cost of seqscan. The
> statistics were updated and quite accurate.
>
> Bitmap table scan uses a formula to interpolate between random_page_cost
> and seq_page_cost to determine the cost per page. In Greenplum Database,
> the default value of random_page_cost is 100, the default value of
> seq_page_cost is 1. With the original cost formula, random_page_cost
> dominates in the final cost result, even the formula is declared to be
> non-linear. However, it is still more like linear, which can't reflect the
> real cost per page when a majority of pages are fetched. Therefore, the
> cost formula is updated to real non-linear function to reflect both
> random_page_cost and seq_page_cost for different percentage of pages
> fetched.
>
> Even though PostgreSQL has much smaller default value of random_page_cost
> (4), the same problem exists there if we change the default value.
>
> Old cost formula:
> cost_per_page = random_page_cost - (random_page_cost - seq_page_cost) *
> sqrt(pages_fetched / T);
> [image: Inline image 1]
>
> New cost formula:
> cost_per_page = seq_page_cost * pow(random_page_cost / seq_page_cost, 1 -
> sqrt(pages_fetched / T));
> [image: Inline image 2]
>
> Below is the graph (credit to Heikki) that plots the total estimated cost
> of a bitmap heap scan, where table size is 10000 pages, and
> random_page_cost=10 and seq_page_cost=1. X axis is the number of pages
> fetche. I.e. on the left, no pages are fetched, and on the right end, at
> 10000, all pages are fetched. The original formula is in black, the new
> formula in red, and the horizontal line, in blue, shows the cost of a Seq
> Scan.
> [image: Inline image 3]

Thanks for caring about bitmap scans ;)

There's a couple earlier/ongoing discussions on this:

In this old thread: https://www.postgresql.org/message-id/CAGTBQpZ%2BauG%2BKhcLghvTecm4-cGGgL8vZb5uA3%3D47K7kf9RgJw%40mail.gmail.com
..Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> Correct me if I'm wrong, but this looks like the planner not
> accounting for correlation when using bitmap heap scans.
>
> Checking the source, it really doesn't.

..which I think is basically right: the formula does distinguish between the
cases of small or large fraction of pages, but doesn't use correlation. Our
issue in that case seems to be mostly a failure of cost_index to account for
fine-scale deviations from large-scale correlation; but, if cost_bitmap
accounted for our high correlation metric (>0.99), it might've helped our case.

Note costsize.c:
* Save amcostestimate's results for possible use in bitmap scan planning.
* We don't bother to save indexStartupCost or indexCorrelation, because a
* BITMAP SCAN DOESN'T CARE ABOUT EITHER.

See more at this recent/ongoing discussion (involving several issues, only one
of which is bitmap cost vs index cost):
https://www.postgresql.org/message-id/flat/20171206214652(dot)GA13889%40telsasoft(dot)com#20171206214652(dot)GA13889(at)telsasoft(dot)com

Consider the bitmap scans in the two different cases:

1) In Vitaliy's case, bitmap was being chosen in preference to index scan (due
in part to random_page_cost>1), but performed poorly, partially because bitmap
component must complete before the heap reads can begin. And also because the
heap reads for the test case involving modular division would've read pages
across the entire length of the table, incurring maximum lseek costs.

2) In my case from ~16 months ago, index scan was being chosen in preference to
bitmap, but index scan was incurring high seek cost. We would've been happy if
the planner would've done a bitmap scan on a weekly-partitioned child table
(with 7 days data), while querying one day's data (1/7th of the table), 99% of
which would been strictly sequential page reads, so incurring low lseek costs
(plus some component of random costs for the remaining 1% "long tail").

It seems clear to me that sequentially reading 1/7th of the tightly clustered
pages in a table ought to be costed differently than reading 1/7th of the pages
evenly distributed accross its entire length.

I started playing with this weeks ago (probably during Vitaliy's problem
report). Is there any reason cost_bitmap_heap_scan shouldn't interpolate based
on correlation from seq_page_cost to rand_page_cost, same as cost_index ?

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Sontakke 2017-12-20 04:12:08 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Mark Dilger 2017-12-20 02:37:28 Re: WIP: BRIN multi-range indexes