Skip site navigation (1) Skip section navigation (2)

Re: How to improve db performance with $7K?

From: Jacques Caron <jc(at)directinfos(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: William Yu <wyu(at)talisys(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to improve db performance with $7K?
Date: 2005-04-18 17:41:49
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance

At 16:59 18/04/2005, Greg Stark wrote:

>William Yu <wyu(at)talisys(dot)com> writes:
> > Using the above prices for a fixed budget for RAID-10, you could get:
> >
> > SATA 7200 -- 680MB per $1000
> > SATA 10K  -- 200MB per $1000
> > SCSI 10K  -- 125MB per $1000
>What a lot of these analyses miss is that cheaper == faster because cheaper
>means you can buy more spindles for the same price. I'm assuming you picked
>equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as
>many spindles as the 125MB/$1000. That means it would have almost double the
>bandwidth. And the 7200 RPM case would have more than 5x the bandwidth.
>While 10k RPM drives have lower seek times, and SCSI drives have a natural
>seek time advantage, under load a RAID array with fewer spindles will start
>hitting contention sooner which results into higher latency. If the controller
>works well the larger SATA arrays above should be able to maintain their
>mediocre latency much better under load than the SCSI array with fewer drives
>would maintain its low latency response time despite its drives' lower average
>seek time.

I would definitely agree. More factors in favor of more cheap drives:
- cheaper drives (7200 rpm) have larger disks (3.7" diameter against 2.6 or 
3.3). That means the outer tracks hold more data, and the same amount of 
data is held on a smaller area, which means less tracks, which means 
reduced seek times. You can roughly count the real average seek time as 
(average seek time over full disk * size of dataset / capacity of disk). 
And you actually need to physicall seek less often too.

- more disks means less data per disk, which means the data is further 
concentrated on outer tracks, which means even lower seek times

Also, what counts is indeed not so much the time it takes to do one single 
random seek, but the number of random seeks you can do per second. Hence, 
more disks means more seeks per second (if requests are evenly distributed 
among all disks, which a good stripe size should achieve).

Not taking into account TCQ/NCQ or write cache optimizations, the important 
parameter (random seeks per second) can be approximated as:

N * 1000 / (lat + seek * ds / (N * cap))

N is the number of disks
lat is the average rotational latency in milliseconds (500/(rpm/60))
seek is the average seek over the full disk in milliseconds
ds is the dataset size
cap is the capacity of each disk

Using this formula and a variety of disks, counting only the disks 
themselves (no enclosures, controllers, rack space, power, maintenance...), 
trying to maximize the number of seeks/second for a fixed budget (1000 
euros) with a dataset size of 100 GB makes SATA drives clear winners: you 
can get more than 4000 seeks/second (with 21 x 80GB disks) where SCSI 
cannot even make it to the 1400 seek/second point (with 8 x 36 GB disks). 
Results can vary quite a lot based on the dataset size, which illustrates 
the importance of "staying on the edges" of the disks. I'll try to make the 
analysis more complete by counting some of the "overhead" (obviously 21 
drives has a lot of other implications!), but I believe SATA drives still 
win in theory.

It would be interesting to actually compare this to real-world (or 
nearly-real-world) benchmarks to measure the effectiveness of features like 
TCQ/NCQ etc.


In response to


pgsql-performance by date

Next:From: Steve PoeDate: 2005-04-18 17:46:01
Subject: Re: How to improve db performance with $7K?
Previous:From: Alan StangeDate: 2005-04-18 17:34:28
Subject: Re: How to improve db performance with $7K?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group