Re: Postgresql and Software RAID/LVM

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Marty Scholes <marty(at)outputservices(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgresql and Software RAID/LVM
Date: 2005-06-07 04:36:53
Message-ID: 42A52465.2000302@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Marty Scholes wrote:
>> Has anyone ran Postgres with software RAID or LVM on a production box?
>> What have been your experience?
>
> Yes, we have run for a couple years Pg with software LVM (mirroring)
> against two hardware RAID5 arrays. We host a production Sun box that
> runs 24/7.
>
> My experience:
> * Software RAID (other than mirroring) is a disaster waiting to happen.
> If the metadata for the RAID set gives out for any reason (CMOS
> scrambles, card dies, power spike, etc.) then you are hosed beyond
> belief. In most cases it is almost impossible to recover. With
> mirroring, however, you can always boot and operate on a single mirror,
> pretending that no LVM/RAID is underway. In other words, each mirror is
> a fully functional copy of the data which will operate your server.

Isn't this actually more of a problem for the meta-data to give out in a
hardware situation? I mean, if the card you are using dies, you can't
just get another one.
With software raid, because the meta-data is on the drives, you can pull
it out of that machine, and put it into any machine that has a
controller which can read the drives, and a similar kernel, and you are
back up and running.
>
> * Hardware RAID5 is a terrific way to boost performance via write
> caching and spreading I/O across multiple spindles. Each of our
> external arrays operates 14 drives (12 data, 1 parity and 1 hot spare).
> While RAID5 protects against single spindle failure, it will not hedge
> against multiple failures in a short time period, SCSI contoller
> failure, SCSI cable problems or even wholesale failure of the RAID
> controller. All of these things happen in a 24/7 operation. Using
> software RAID1 against the hardware RAID5 arrays hedges against any
> single failure.

No, it hedges against *more* than one failure. But you can also do a
RAID1 over a RAID5 in software. But if you are honestly willing to
create a full RAID1, just create a RAID1 over RAID0. The performance is
much better. And since you have a full RAID1, as long as both drives of
a pairing don't give out, you can lose half of your drives.

If you want the space, but you feel that RAID5 isn't redundant enough,
go to RAID6, which uses 2 parity locations, each with a different method
of storing parity, so not only is it more redundant, you have a better
chance of finding problems.

>
> * Software mirroring gives you tremendous ability to change the system
> while it is running, by taking offline the mirror you wish to change and
> then synchronizing it after the change.
>

That certainly is a nice ability. But remember that LVM also has the
idea of "snapshot"ing a running system. I don't know the exact details,
just that there is a way to have some processes see the filesystem as it
existed at an exact point in time. Which is also a great way to handle
backups.

> On a fully operational production server, we have:
> * restriped the RAID5 array
> * replaced all RAID5 media with higher capacity drives
> * upgraded RAID5 controller
> * moved all data from an old RAID5 array to a newer one
> * replaced host SCSI controller
> * uncabled and physically moved storage to a different part of data center
>
> Again, all of this has taken place (over the years) while our machine
> was fully operational.
>
So you are saying that you were able to replace the RAID controller
without turning off the machine? I realize there does exist
hot-swappable PCI cards, but I think you are overstating what you mean
by "fully operational". For instance, it's not like you can access your
data while it is being physically moved.

I do think you had some nice hardware. But I know you can do all of this
in software as well. It is usually a price/performance tradeoff. You
spend quite a bit to get a hardware RAID card that can keep up with a
modern CPU. I know we have an FC raid box at work which has a full 512MB
of cache on it, but it wasn't that much cheaper than buying a dedicated
server.

John
=:->

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-06-07 05:15:52 Re: db corruption/recovery help
Previous Message Tom Lane 2005-06-07 03:42:50 Re: pg_dump 8.0.3 failing against PostgreSQL 7.3.2

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-06-07 04:54:36 Re: Need help to decide Mysql vs Postgres
Previous Message Tom Lane 2005-06-07 04:28:43 Re: Postgresql on an AMD64 machine