Bitmap table scan cost per page formula

From: Haisheng Yuan <hyuan(at)pivotal(dot)io>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bitmap table scan cost per page formula
Date: 2017-12-19 19:55:32
Message-ID: CAPW_87H0X2ZAr_oZu6MTS62h=_oyi9NWU=jnPApcxjB8BfcK0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

This is Haisheng Yuan from Greenplum Database.

We had some query in production showing that planner favors seqscan over
bitmapscan, and the execution of seqscan is 5x slower than using
bitmapscan, but the cost of bitmapscan is 2x the cost of seqscan. The
statistics were updated and quite accurate.

Bitmap table scan uses a formula to interpolate between random_page_cost
and seq_page_cost to determine the cost per page. In Greenplum Database,
the default value of random_page_cost is 100, the default value of
seq_page_cost is 1. With the original cost formula, random_page_cost
dominates in the final cost result, even the formula is declared to be
non-linear. However, it is still more like linear, which can't reflect the
real cost per page when a majority of pages are fetched. Therefore, the
cost formula is updated to real non-linear function to reflect both
random_page_cost and seq_page_cost for different percentage of pages
fetched.

Even though PostgreSQL has much smaller default value of random_page_cost
(4), the same problem exists there if we change the default value.

Old cost formula:
cost_per_page = random_page_cost - (random_page_cost - seq_page_cost) *
sqrt(pages_fetched / T);
[image: Inline image 1]

New cost formula:
cost_per_page = seq_page_cost * pow(random_page_cost / seq_page_cost, 1 -
sqrt(pages_fetched / T));
[image: Inline image 2]

Below is the graph (credit to Heikki) that plots the total estimated cost
of a bitmap heap scan, where table size is 10000 pages, and
random_page_cost=10 and seq_page_cost=1. X axis is the number of pages
fetche. I.e. on the left, no pages are fetched, and on the right end, at
10000, all pages are fetched. The original formula is in black, the new
formula in red, and the horizontal line, in blue, shows the cost of a Seq
Scan.
[image: Inline image 3]

Thoughts? Any better ideas?

Thanks!
Haisheng Yuan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-12-19 19:58:49 Re: [HACKERS] replace GrantObjectType with ObjectType
Previous Message Peter Eisentraut 2017-12-19 19:52:00 Re: [HACKERS] logical decoding of two-phase transactions