Skip site navigation (1) Skip section navigation (2)

Re: With 4 disks should I go for RAID 5 or RAID 10

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: david(at)lang(dot)hm
Cc: Fernando Hevia <fhevia(at)ip-tel(dot)com(dot)ar>, 'pgsql-performance' <pgsql-performance(at)postgresql(dot)org>
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10
Date: 2007-12-26 21:54:00
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
david(at)lang(dot)hm wrote:
>> Thanks for the explanation David. It's good to know not only what but 
>> also
>> why. Still I wonder why reads do hit all drives. Shouldn't only 2 
>> disks be
>> read: the one with the data and the parity disk?
> no, becouse the parity is of the sort (A+B+C+P) mod X = 0
> so if X=10 (which means in practice that only the last decimal digit 
> of anything matters, very convienient for examples)
> A=1, B=2, C=3, A+B+C=6, P=4, A+B+C+P=10=0
> if you read B and get 3 and P and get 4 you don't know if this is 
> right or not unless you also read A and C (at which point you would 
> get A+B+C+P=11=1=error)
I don't think this is correct. RAID 5 is parity which is XOR. The 
property of XOR is such that it doesn't matter what the other drives 
are. You can write any block given either: 1) The block you are 
overwriting and the parity, or 2) all other blocks except for the block 
we are writing and the parity. Now, it might be possible that option 2) 
is taken more than option 1) for some complicated reasons, but it is NOT 
to check consistency. The array is assumed consistent until proven 

> in theory a system could get the same performance with a large 
> sequential read/write on raid5/6 as on a raid0 array of equivilent 
> size (i.e. same number of data disks, ignoring the parity disks) 
> becouse the OS could read the entire stripe in at once, do the 
> calculation once, and use all the data (or when writing, don't write 
> anything until you are ready to write the entire stripe, calculate the 
> parity and write everything once).
For the same number of drives, this cannot be possible. With 10 disks, 
on raid5, 9 disks hold data, and 1 holds parity. The theoretical maximum 
performance is only 9/10 of the 10/10 performance possible with RAID 0.

> Unfortunantly in practice filesystems don't support this, they don't 
> do enough readahead to want to keep the entire stripe (so after they 
> read it all in they throw some of it away), they (mostly) don't know 
> where a stripe starts (and so intermingle different types of data on 
> one stripe and spread data across multiple stripes unessasarily), and 
> they tend to do writes in small, scattered chunks (rather then 
> flushing an entire stripes worth of data at once)
In my experience, this theoretical maximum is not attainable without 
significant write cache, and an intelligent controller, neither of which 
Linux software RAID seems to have by default. My situation was a bit 
worse in that I used applications that fsync() or journalled metadata 
that is ordered, which forces the Linux software RAID to flush far more 
than it should - but the same system works very well with RAID 1+0.

>>> Linux software raid can do more then two disks in a mirror, so you 
>>> may be
>>> able to get the added protection with raid 1 sets (again, probably not
>>> relavent to four drives), although there were bugs in this within 
>>> the last
>>> six months or so, so you need to be sure your kernel is new enough 
>>> to have
>>> the fix.
>> Well, here rises another doubt. Should I go for a single RAID 1+0 
>> storing OS
>> + Data + WAL files or will I be better off with two RAID 1 separating 
>> data
>> from OS + Wal files?
> if you can afford the space, you are almost certinly better seperating 
> the WAL from the data (I think I've seen debates about which is better 
> OS+data/Wal or date/OS+Wal, but very little disagreement that either 
> is better than combining them all)
I don't think there is a good answer for this question. If you can 
afford my drives, you could also afford to make your RAID 1+0 bigger. 
Splitting OS/DATA/WAL is only "absolute best" if can arrange your 3 
arrays such that there size is relative to their access patterns. For 
example, in an overly simplified case, if you use OS 1/4 of DATA, and 
WAL 1/2 of DATA, then perhaps "best" is to have a two-disk RAID 1 for 
OS, a four-disk RAID 1+0 for WAL, and an eight-disk RAID 1+0 for DATA. 
This gives a total of 14 disks. :-)

In practice, if you have four drives, and you try and it into two plus 
two, you're going to find that two of the drives are going to be more 
idle than the other two.

I have a fun setup - I use RAID 1 across all four drives for the OS, 
RAID 1+0 for the database, wal, and other parts, and RAID 0 for a 
"build" partition. :-)


Mark Mielke <mark(at)mielke(dot)cc>

In response to


pgsql-performance by date

Next:From: Mark MielkeDate: 2007-12-26 21:57:08
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10
Previous:From: Guillaume SmetDate: 2007-12-26 21:52:04
Subject: Re: More shared buffers causes lower performances

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group