Re: With 4 disks should I go for RAID 5 or RAID 10

From: david(at)lang(dot)hm
To: Fernando Hevia <fhevia(at)ip-tel(dot)com(dot)ar>
Cc: "'pgsql-performance'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10
Date: 2007-12-26 22:52:20
Message-ID: Pine.LNX.4.64.0712261431490.11785@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 26 Dec 2007, Fernando Hevia wrote:

>> David Lang Wrote:
>>
>> with only four drives the space difference between raid 1+0 and raid 5
>> isn't that much, but when you do a write you must write to two drives (the
>> drive holding the data you are changing, and the drive that holds the
>> parity data for that stripe, possibly needing to read the old parity data
>> first, resulting in stalling for seek/read/calculate/seek/write since
>> the drive moves on after the read), when you read you must read _all_
>> drives in the set to check the data integrity.
>
> Thanks for the explanation David. It's good to know not only what but also
> why. Still I wonder why reads do hit all drives. Shouldn't only 2 disks be
> read: the one with the data and the parity disk?

no, becouse the parity is of the sort (A+B+C+P) mod X = 0

so if X=10 (which means in practice that only the last decimal digit of
anything matters, very convienient for examples)

A=1, B=2, C=3, A+B+C=6, P=4, A+B+C+P=10=0

if you read B and get 3 and P and get 4 you don't know if this is right or
not unless you also read A and C (at which point you would get
A+B+C+P=11=1=error)

>> for seek heavy workloads (which almost every database application is) the
>> extra seeks involved can be murder on your performance. if your workload
>> is large sequential reads/writes, and you can let the OS buffer things for
>> you, the performance of raid 5 is much better.
>
> Well, actually most of my application involves large sequential
> reads/writes. The memory available for buffering (4GB) isn't bad either, at
> least for my scenario. On the other hand I have got such strong posts
> against RAID 5 that I doubt to even consider it.

in theory a system could get the same performance with a large sequential
read/write on raid5/6 as on a raid0 array of equivilent size (i.e. same
number of data disks, ignoring the parity disks) becouse the OS could read
the entire stripe in at once, do the calculation once, and use all the
data (or when writing, don't write anything until you are ready to write
the entire stripe, calculate the parity and write everything once).

Unfortunantly in practice filesystems don't support this, they don't do
enough readahead to want to keep the entire stripe (so after they read it
all in they throw some of it away), they (mostly) don't know where a
stripe starts (and so intermingle different types of data on one stripe
and spread data across multiple stripes unessasarily), and they tend to do
writes in small, scattered chunks (rather then flushing an entire stripes
worth of data at once)

those who have been around long enough to remember the days of MFM/RLL
(when you could still find the real layout of the drives) may remember
optmizing things to work a track at a time instead of a sector at a time.
this is the exact same logic, just needing to be applied to drive stripes
instead of sectors and tracks on a single drive.

the issue has been raised with the kernel developers, but there's a lot of
work to be done (especially in figuring out how to get all the layers the
info they need in a reasonable way)

>> Linux software raid can do more then two disks in a mirror, so you may be
>> able to get the added protection with raid 1 sets (again, probably not
>> relavent to four drives), although there were bugs in this within the last
>> six months or so, so you need to be sure your kernel is new enough to have
>> the fix.
>>
>
> Well, here rises another doubt. Should I go for a single RAID 1+0 storing OS
> + Data + WAL files or will I be better off with two RAID 1 separating data
> from OS + Wal files?

if you can afford the space, you are almost certinly better seperating the
WAL from the data (I think I've seen debates about which is better
OS+data/Wal or date/OS+Wal, but very little disagreement that either is
better than combining them all)

David Lang

>> now, if you can afford solid-state drives which don't have noticable seek
>> times, things are completely different ;-)
>
> Ha, sadly budget is very tight. :)
>
> Regards,
> Fernando.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2007-12-26 22:53:13 Re: More shared buffers causes lower performances
Previous Message Greg Smith 2007-12-26 22:40:06 Re: With 4 disks should I go for RAID 5 or RAID 10