Re: Sequential Scan Read-Ahead

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Kyle <kaf(at)nwlink(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sequential Scan Read-Ahead
Date: 2002-04-26 02:18:47
Message-ID: 200204260218.g3Q2Ili11246@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Nice test. Would you test simultaneous 'dd' on the same file, perhaps
with a slight delay between to the two so they don't read each other's
blocks?

seek() in the file will turn off read-ahead in most OS's. I am not
saying this is a major issue for PostgreSQL but the numbers would be
interesting.

---------------------------------------------------------------------------

Kyle wrote:
> Tom Lane wrote:
> > ...
> > Curt Sampson <cjs(at)cynic(dot)net> writes:
> > > 3. Proof by testing. I wrote a little ruby program to seek to a
> > > random point in the first 2 GB of my raw disk partition and read
> > > 1-8 8K blocks of data. (This was done as one I/O request.) (Using
> > > the raw disk partition I avoid any filesystem buffering.)
> >
> > And also ensure that you aren't testing the point at issue.
> > The point at issue is that *in the presence of kernel read-ahead*
> > it's quite unclear that there's any benefit to a larger request size.
> > Ideally the kernel will have the next block ready for you when you
> > ask, no matter what the request is.
> > ...
>
> I have to agree with Tom. I think the numbers below show that with
> kernel read-ahead, block size isn't an issue.
>
> The big_file1 file used below is 2.0 gig of random data, and the
> machine has 512 mb of main memory. This ensures that we're not
> just getting cached data.
>
> foreach i (4k 8k 16k 32k 64k 128k)
> echo $i
> time dd bs=$i if=big_file1 of=/dev/null
> end
>
> and the results:
>
> bs user kernel elapsed
> 4k: 0.260 7.740 1:27.25
> 8k: 0.210 8.060 1:30.48
> 16k: 0.090 7.790 1:30.88
> 32k: 0.060 8.090 1:32.75
> 64k: 0.030 8.190 1:29.11
> 128k: 0.070 9.830 1:28.74
>
> so with kernel read-ahead, we have basically the same elapsed (wall
> time) regardless of block size. Sure, user time drops to a low at 64k
> blocksize, but kernel time is increasing.
>
>
> You could argue that this is a contrived example, no other I/O is
> being done. Well I created a second 2.0g file (big_file2) and did two
> simultaneous reads from the same disk. Sure performance went to hell
> but it shows blocksize is still irrelevant in a multi I/O environment
> with sequential read-ahead.
>
> foreach i ( 4k 8k 16k 32k 64k 128k )
> echo $i
> time dd bs=$i if=big_file1 of=/dev/null &
> time dd bs=$i if=big_file2 of=/dev/null &
> wait
> end
>
> bs user kernel elapsed
> 4k: 0.480 8.290 6:34.13 bigfile1
> 0.320 8.730 6:34.33 bigfile2
> 8k: 0.250 7.580 6:31.75
> 0.180 8.450 6:31.88
> 16k: 0.150 8.390 6:32.47
> 0.100 7.900 6:32.55
> 32k: 0.190 8.460 6:24.72
> 0.060 8.410 6:24.73
> 64k: 0.060 9.350 6:25.05
> 0.150 9.240 6:25.13
> 128k: 0.090 10.610 6:33.14
> 0.110 11.320 6:33.31
>
>
> the differences in read times are basically in the mud. Blocksize
> just doesn't matter much with the kernel doing readahead.
>
> -Kyle
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-04-26 02:20:49 Re: Vote totals for SET in aborted transaction
Previous Message mlw 2002-04-26 02:16:00 8K vs 16K block size report