Re: Bitmap table scan cost per page formula

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haisheng Yuan <hyuan(at)pivotal(dot)io>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap table scan cost per page formula
Date: 2017-12-21 04:51:04
Message-ID: CAMkU=1xaLs8RnWrY_jLAc_ON--5sT0-QCEbBcQCjR5oxMHc4tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> It is not obvious to me that the parabola is wrong. I've certainly seen
>> cases where reading every 2nd or 3rd block (either stochastically, or
>> modulus) actually does take longer than reading every block, because it
>> defeats read-ahead. But it depends on a lot on your kernel version and
>> your kernel settings and your file system and probably other things as well.
>>
>
> Well, that's an interesting point, too. Maybe we need another graph that
> also shows the actual runtime of a bitmap scan and a sequential scan.
>

I've did some low level IO benchmarking, and I actually get 13 times slower
to read every 3rd block than every block using CentOS6.9 with ext4 and the
setting:
blockdev --setra 8192 /dev/sdb1
On some virtualized storage which I don't know the details of, but it
behaves as if it were RAID/JBOD with around 6 independent spindles..

If I pick the 1/3 of the blocks to read stochastically rather than by
modulus, it is only 2 times slower than reading all of them, I assume
because having occasional reads which are adjacent to each other does make
read-ahead kick in, while evenly spaced never-adjacent reads does not.
This is probably a better model of how bitmap table scans actually work, as
there is no general reason to think they would be evenly spaced and
non-adjacent. So this result is in reasonable agreement with how the
current cost estimation works, the parabola peaks at about twice the cost
as the sequence scan.

I used a file of about 10GB, because I happened to have one laying around.

## read every block ($_%3>5 is never true)
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024;
print length $x} '|uniq -c

1295683 8192
4317 0
real 0m38.329s

## read every 3rd block
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024;
print length $x} '|uniq -c
431894 8192
1439 0
real 8m54.698s

time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} '|uniq -c
431710 8192
1434 0
real 1m23.130s

Dropping the caches is a reasonable way to do this type of benchmark,
because it simulates what would happen if your data set was much larger
than RAM, without needing to actually use a data set much larger than RAM.

It would be interesting to see what other people get for similar low level
tests, as well actual bitmap scans.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2017-12-21 05:42:27 Re: GSoC 2018
Previous Message Michael Paquier 2017-12-21 04:32:35 Re: Add hint about replication slots when nearing wraparound