Re: Arguments Pro/Contra Software Raid

From: Steve Atkins <steve(at)blighty(dot)com>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>, "Pgsql-Performance ((E-mail))" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Arguments Pro/Contra Software Raid
Date: 2006-05-09 14:41:16
Message-ID: 473FC220-A52C-4954-81E9-A14ED33164C2@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


On May 9, 2006, at 2:16 AM, Hannes Dorbath wrote:

> Hi,
>
> I've just had some discussion with colleagues regarding the usage
> of hardware or software raid 1/10 for our linux based database
> servers.
>
> I myself can't see much reason to spend $500 on high end controller
> cards for a simple Raid 1.
>
> Any arguments pro or contra would be desirable.
>
> From my experience and what I've read here:
>
> + Hardware Raids might be a bit easier to manage, if you never
> spend a few hours to learn Software Raid Tools.
>
> + There are situations in which Software Raids are faster, as CPU
> power has advanced dramatically in the last years and even high end
> controller cards cannot keep up with that.
>
> + Using SATA drives is always a bit of risk, as some drives are
> lying about whether they are caching or not.

Don't buy those drives. That's unrelated to whether you use hardware
or software RAID.

>
> + Using hardware controllers, the array becomes locked to a
> particular vendor. You can't switch controller vendors as the array
> meta information is stored proprietary. In case the Raid is broken
> to a level the controller can't recover automatically this might
> complicate manual recovery by specialists.

Yes. Fortunately we're using the RAID for database work, rather than
file
storage, so we can use all the nice postgresql features for backing up
and replicating the data elsewhere, which avoids most of this issue.

>
> + Even battery backed controllers can't guarantee that data written
> to the drives is consistent after a power outage, neither that the
> drive does not corrupt something during the involuntary shutdown /
> power irregularities. (This is theoretical as any server will be
> UPS backed)

fsync of WAL log.

If you have a battery backed writeback cache then you can get the
reliability
of fsyncing the WAL for every transaction, and the performance of not
needing
to hit the disk for every transaction.

Also, if you're not doing that you'll need to dedicate a pair of
spindles to the
WAL log if you want to get good performance, so that there'll be no
seeking
on the WAL. With a writeback cache you can put the WAL on the same
spindles
as the database and not lose much, if anything, in the way of
performance.
If that saves you the cost of two additional spindles, and the space
on your
drive shelf for them, you've just paid for a reasonably proced RAID
controller.

Given those advantages... I can't imagine speccing a large system
that didn't
have a battery-backed write-back cache in it. My dev systems mostly use
software RAID, if they use RAID at all. But my production boxes all
use SATA
RAID (and I tell my customers to use controllers with BB cache,
whether it
be SCSI or SATA).

My usual workloads are write-heavy. If yours are read-heavy that will
move the sweet spot around significantly, and I can easily imagine that
for a read-heavy load software RAID might be a much better match.

Cheers,
Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marko Kreen 2006-05-09 14:43:32 Re: pgcrypto sha256/384/512 don't work on Redhat. Please help!
Previous Message Steve Atkins 2006-05-09 14:31:38 Re: What is your favorite front end for user interaction to postgresql databases?

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2006-05-09 15:00:29 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message Tom Lane 2006-05-09 13:31:56 Re: Big IN() clauses etc : feature proposal