Re: Bitmap table scan cost per page formula

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Haisheng Yuan <hyuan(at)pivotal(dot)io>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap table scan cost per page formula
Date: 2017-12-21 01:03:47
Message-ID: 9861.1513818227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> It is not obvious to me that the parabola is wrong. I've certainly seen
>> cases where reading every 2nd or 3rd block (either stochastically, or
>> modulus) actually does take longer than reading every block, because it
>> defeats read-ahead. But it depends on a lot on your kernel version and
>> your kernel settings and your file system and probably other things as well.

> Well, that's an interesting point, too.

Yes. I think the proposed new curve is flat wrong: for areas near the
middle of the graph, the estimated I/O cost for a bitmap scan should
*substantially* exceed the cost of a seqscan, because in that regime
you're reading a lot of pages but you are probably failing to activate
the kernel's readahead heuristics. This is especially true if you
consider that random_page_cost >> seq_page_cost as the Greenplum folk
seem to want.

The parabola is probably wrong in detail --- its behavior as we approach
reading all of the pages ought to be more asymptotic, seems like.
I suppose that the reason it appears to go below the seqscan cost at the
right is that even the rightmost end of the curve doesn't reflect reading
all of the *tuples*, just one tuple per page, so that there's some CPU
savings from not inspecting all the tuples. Once you approach needing to
read all the tuples, the curve ought to go back higher than seqscan cost.

The point upthread about perhaps wanting to consider index correlation is
interesting too. I think that correlation as currently defined is the
wrong stat for that purpose: what we are interested in for a bitmap scan
is clumping, not ordering. If reading 10% of the index is likely to
result in accessing a tightly packed 10% of the table, rather than
roughly-one-page-in-10, then we ideally ought to be costing that as
sequential access not random access. It's true that perfect correlation
would guarantee good clumping as well as good ordering, but it seems like
that's too strong an assumption. An index with almost zero correlation
might still have pretty good clumping behavior.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-12-21 01:05:35 Re: User defined data types in Logical Replication
Previous Message Michael Paquier 2017-12-21 00:51:58 Re: Tracking of page changes for backup purposes. PTRACK [POC]