Skip site navigation (1) Skip section navigation (2)

Re: Sunfire X4500 recommendations

From: "Matt Smiley" <mss(at)rentrak(dot)com>
To: <dimitrik(dot)fr(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sunfire X4500 recommendations
Date: 2007-03-28 04:44:42
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Hi Dimitri,

First of all, thanks again for the great feedback!

Yes, my I/O load is mostly read operations.  There are some bulk writes done in the background periodically throughout the day, but these are not as time-sensitive.  I'll have to do some testing to find the best balance of read vs. write speed and tolerance of disk failure vs. usable diskspace.

I'm looking forward to seeing the results of your OLTP tests!  Good luck!  Since I won't be doing that myself, it'll be all new to me.

About disk failure, I certainly agree that increasing the number of disks will decrease the average time between disk failures.  Apart from any performance considerations, I wanted to get a clear idea of the risk of data loss under various RAID configurations.  It's a handy reference, so I thought I'd share it:


The goal is to calculate the probability of data loss when we loose a certain number of disks within a short timespan (e.g. loosing a 2nd disk before replacing+rebuilding the 1st one).  For RAID 10, 50, and Z, we will loose data if any disk group (i.e. mirror or parity-group) looses 2 disks.  For RAID 60 and Z2, we will loose data if 3 disks die in the same parity group.  The parity groups can include arbitrarily many disks.  Having larger groups gives us more usable diskspace but less protection.  (Naturally we're more likely to loose 2 disks in a group of 50 than in a group of 5.)

    g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or more; dual-parity = 4 or more)
    n = total number of disks
    risk of loosing any 1 disk = 1/n
    risk of loosing 1 disk from a particular group = g/n
    risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1)
    risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2)

For the x4500, we have 48 disks.  If we stripe our data across all those disks, then these are our configuration options:

RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
              2          24           48            24              0.09%
              3          16           48            32              0.27%
              4          12           48            36              0.53%
              6           8           48            40              1.33%
              8           6           48            42              2.48%
             12           4           48            44              5.85%
             24           2           48            46             24.47%
             48           1           48            47            100.00%

RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
              2          24           48           n/a                n/a
              3          16           48            16              0.01%
              4          12           48            24              0.02%
              6           8           48            32              0.12%
              8           6           48            36              0.32%
             12           4           48            40              1.27%
             24           2           48            44             11.70%
             48           1           48            46            100.00%

So, in terms of fault tolerance:
 - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when only 2 disks fail.
 - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk groups across which to spread the risk.
 - Having more parity groups increases fault tolerance but decreases usable diskspace.

That's all assuming each disk has an equal chance of failure, which is probably true since striping should distribute the workload evenly.  And again, these probabilities are only describing the case where we don't have enough time between disk failures to recover the array.

In terms of performance, I think RAID 10 should always be best for write speed.  (Since it doesn't calculate parity, writing a new block doesn't require reading the rest of the RAID stripe just to recalculate the parity bits.)  I think it's also normally just as fast for reading, since the controller can load-balance the pending read requests to both sides of each mirror.



pgsql-performance by date

Next:From: davidDate: 2007-03-28 05:34:38
Subject: Re: Sunfire X4500 recommendations
Previous:From: Joshua D. DrakeDate: 2007-03-27 22:20:14
Subject: Re: How to enable jdbc???

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group