Re: Sequential I/O Cost (was Re: A Better External Sort?)

From: Ron Peacetree <rjpeace(at)earthlink(dot)net>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential I/O Cost (was Re: A Better External Sort?)
Date: 2005-09-29 07:42:54
Message-ID: 2944051.1127979774218.JavaMail.root@elwamui-polski.atl.sa.earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>From: "Jeffrey W. Baker" <jwbaker(at)acm(dot)org>
>Sent: Sep 29, 2005 12:33 AM
>Subject: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?)
>
>On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote:
>>>From: "Jeffrey W. Baker" <jwbaker(at)acm(dot)org>
>>>Perhaps I believe this because you can now buy as much sequential I/O
>>>as you want. Random I/O is the only real savings.
>>>
>> 1= No, you can not "buy as much sequential IO as you want". Even if
>> with an infinite budget, there are physical and engineering limits. Long
>> before you reach those limits, you will pay exponentially increasing costs
>> for linearly increasing performance gains. So even if you _can_ buy a
>> certain level of sequential IO, it may not be the most efficient way to
>> spend money.
>
>This is just false. You can buy sequential I/O for linear money up to
>and beyond your platform's main memory bandwidth. Even 1GB/sec will
>severely tax memory bandwidth of mainstream platforms, and you can
>achieve this rate for a modest cost.
>
I don't think you can prove this statement.
A= www.pricewatch.com lists 7200rpm 320GB SATA II HDs for ~$160.
ASTR according to www.storagereview.com is ~50MBps. Average access
time is ~12-13ms.
Absolute TOTL 15Krpm 147GB U320 or FC HDs cost ~4x as much per GB,
yet only deliver ~80-90MBps ASTR and average access times of
~5.5-6.0ms.
Your statement is clearly false in terms of atomic raw HD performance.

B= low end RAID controllers can be obtained for a few $100's. But even
amongst them, a $600+ card does not perform 3-6x better than a
$100-$200 card. When the low end HW is not enough, the next step in
price is to ~$10K+ (ie Xyratex), and the ones after that are to ~$100K+
(ie NetApps) and ~$1M+ (ie EMC, IBM, etc). None of these ~10x steps
in price results in a ~10x increase in performance.
Your statement is clearly false in terms of HW based RAID performance.

C= A commodity AMD64 mainboard with a dual channel DDR PC3200
RAM subsystem has 6.4GBps of bandwidth. These are as common
as weeds and almost as cheap: www.pricewatch.com
Your statement about commodity systems main memory bandwidth
being "severely taxed at 1GBps" is clearly false.

D= Xyratecs makes RAID HW for NetApps and EMC. NONE of their
current HW can deliver 1GBps. More like 600-700MBps. Engino and
Dot Hill have similar limitations on their current products. No PCI or
PCI-X based HW could ever do more than ~800-850MBps since
that's the RW limit of those busses. Next Gen products are likely to
2x those limits and cross the 1GBps barrier based on ~90MBps SAS
or FC HD's and PCI-Ex8 (2GBps max) and PCI-Ex16 (4GBps max).
Note that not even next gen or 2 gens from now RAID HW will be
able to match the memory bandwidth of the current commodity
memory subsystem mentioned in "C" above.
Your statement that one can achieve a HD IO rate that will tax RAM
bandwidth at modest cost is clearly false.

QED Your statement is false on all counts and in all respects.

>I have one array that can supply this rate and it has only 15 disks. It
>would fit on my desk. I think your dire talk about the limits of
>science and engineering may be a tad overblown.
>
Name it and post its BOM, configuration specs, price and ordering
information. Then tell us what it's plugged into and all the same
details on _that_.

If all 15 HD's are being used for one RAID set, then you can't be
using RAID 10, which means any claims re: write performance in
particular should be closely examined.

A 15 volume RAID 5 made of the fastest 15Krpm U320 or FC HDs,
each with ~85.9MBps ASTR, could in theory do ~14*85.9=
~1.2GBps raw ASTR for at least reads, but no one I know of makes
commodity RAID HW that can keep up with this, nor can any one
PCI-X bus support it even if such commodity RAID HW did exist.

Hmmm. SW RAID on at least a PCI-Ex8 bus might be able to do it if
we can multiplex enough 4Gbps FC lines (4Gbps= 400MBps => max
of 4 of the above HDs per line and 4 FC lines) with low enough latency
and have enough CPU driving it...Won't be easy nor cheap though.

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2005-09-29 10:44:55 Re: Comparative performance
Previous Message Magnus Hagander 2005-09-29 06:29:09 Re: Comparative performance