Re: Contemplating SSD Hardware RAID

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, Florian Weimer <fweimer(at)bfk(dot)de>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Contemplating SSD Hardware RAID
Date: 2011-06-21 22:17:21
Message-ID: 4E011871.1040603@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 06/21/2011 05:35 PM, Merlin Moncure wrote:
> On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com> wrote:
>
>> On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga<yebhavinga(at)gmail(dot)com> wrote:
>>
>>
>>> It's too bad however that OCZ doesn't let the user
>>> choose which firmware to run (the tool always picks the newest), so after
>>> every upgrade it'll be a surprise what values are supported or if any of the
>>>
>> That right there pretty much eliminates them from consideration for
>> enterprise applications.
>>
> As much as I've been irritated with Intel for being intentionally
> oblique on the write caching issue -- I think they remain more or less
> the only game in town for enterprise use.

That's at the core of why I have been so consistently cranky about
them. The sort of customers I deal with who are willing to spend money
on banks of SSD will buy Intel, and the "Enterprise" feature set seems
completely enough that it doesn't set off any alarms to them. The same
is not true of OCZ, which unfortunately means I never even get them onto
the vendor grid in the first place. Everybody runs out to buy the Intel
units instead, they get burned by the write cache issues, lose data, and
sometimes they even blame PostgreSQL for it.

I have a customer who has around 50 X25-E drives, a little stack of them
in six servers running two similar databases. They each run about a
terabyte, and refill about every four months (old data eventually ages
out, replaced by new). At the point I started working with them, they
had lost the entire recent history twice--terabyte gone,
whoosh!--because the power reliability is poor in their area. And
network connectivity is bad enough that they can't ship this volume of
updates to elsewhere either.

It happened again last month, and for the first time the database was
recoverable. I converted one server to be a cold spare, just archive
the WAL files. And that's the only one that lived through the nasty
power spike+outage that corrupted the active databases on both the
master and the warm standby of each set. All four of the servers where
PostgreSQL was writing data and expected proper fsync guarantees, all
gone from one power issue. At the point I got involved, they were about
to cancel this entire PostgreSQL experiment because they assumed the
database had to be garbage that this kept happening; until I told them
about this known issue they never considered the drives were the
problem. That's what I think of when people ask me about the Intel X25-E.

I've very happy with the little 3rd generation consumer grade SSD I
bought from Intel though (320 series). If they just do the same style
of write cache and reliability rework to the enterprise line, but using
better flash, I agree that the first really serious yet affordable
product for the database market may finally come out of that. We're
just not there yet, and unfortunately for the person who started this
round of discussion throwing hardware RAID at the problem doesn't make
this go away either.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2011-06-22 07:05:14 Re: Improve the Postgres Query performance
Previous Message Merlin Moncure 2011-06-21 21:35:33 Re: Contemplating SSD Hardware RAID