Re: Sequential Scan Read-Ahead

From: Kyle <kaf(at)nwlink(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sequential Scan Read-Ahead
Date: 2002-04-26 00:40:53
Message-ID: 15560.41493.529847.635632@doppelbock.patentinvestor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> ...
> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > 3. Proof by testing. I wrote a little ruby program to seek to a
> > random point in the first 2 GB of my raw disk partition and read
> > 1-8 8K blocks of data. (This was done as one I/O request.) (Using
> > the raw disk partition I avoid any filesystem buffering.)
>
> And also ensure that you aren't testing the point at issue.
> The point at issue is that *in the presence of kernel read-ahead*
> it's quite unclear that there's any benefit to a larger request size.
> Ideally the kernel will have the next block ready for you when you
> ask, no matter what the request is.
> ...

I have to agree with Tom. I think the numbers below show that with
kernel read-ahead, block size isn't an issue.

The big_file1 file used below is 2.0 gig of random data, and the
machine has 512 mb of main memory. This ensures that we're not
just getting cached data.

foreach i (4k 8k 16k 32k 64k 128k)
echo $i
time dd bs=$i if=big_file1 of=/dev/null
end

and the results:

bs user kernel elapsed
4k: 0.260 7.740 1:27.25
8k: 0.210 8.060 1:30.48
16k: 0.090 7.790 1:30.88
32k: 0.060 8.090 1:32.75
64k: 0.030 8.190 1:29.11
128k: 0.070 9.830 1:28.74

so with kernel read-ahead, we have basically the same elapsed (wall
time) regardless of block size. Sure, user time drops to a low at 64k
blocksize, but kernel time is increasing.

You could argue that this is a contrived example, no other I/O is
being done. Well I created a second 2.0g file (big_file2) and did two
simultaneous reads from the same disk. Sure performance went to hell
but it shows blocksize is still irrelevant in a multi I/O environment
with sequential read-ahead.

foreach i ( 4k 8k 16k 32k 64k 128k )
echo $i
time dd bs=$i if=big_file1 of=/dev/null &
time dd bs=$i if=big_file2 of=/dev/null &
wait
end

bs user kernel elapsed
4k: 0.480 8.290 6:34.13 bigfile1
0.320 8.730 6:34.33 bigfile2
8k: 0.250 7.580 6:31.75
0.180 8.450 6:31.88
16k: 0.150 8.390 6:32.47
0.100 7.900 6:32.55
32k: 0.190 8.460 6:24.72
0.060 8.410 6:24.73
64k: 0.060 9.350 6:25.05
0.150 9.240 6:25.13
128k: 0.090 10.610 6:33.14
0.110 11.320 6:33.31

the differences in read times are basically in the mud. Blocksize
just doesn't matter much with the kernel doing readahead.

-Kyle

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2002-04-26 01:56:57 Re: Vote totals for SET in aborted transaction
Previous Message Vince Vielhaber 2002-04-25 21:42:33 Re: Vote totals for SET in aborted transaction