Re: Intel SSDs that may not suck

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Intel SSDs that may not suck
Date: 2011-03-29 18:19:37
Message-ID: 4D9222B9.9020109@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/29/2011 06:34 AM, Yeb Havinga wrote:
> While I appreciate the heads up about these new drives, your posting
> suggests (though you formulated in a way that you do not actually say
> it) that OCZ products do not have a long term reliability. No factual
> data. If you have knowledge of sandforce based OCZ drives fail, that'd
> be interesting because that's the product line what the new Intel SSD
> ought to be compared with.

I didn't want to say anything too strong until I got to the bottom of
the reports I'd been sorting through. It turns out that there is a very
wide incompatibility between OCZ drives and some popular Gigabyte
motherboards:
http://www.ocztechnologyforum.com/forum/showthread.php?76177-do-you-own-a-Gigabyte-motherboard-and-have-the-SMART-error-with-FW1.11...look-inside

(I'm typing this message on a system with one of the impacted
combinations, one reason why I don't own a Vertex 2 Pro yet. That I
would have to run a "Beta BIOS" does not inspire confidence.)

What happens on the models impacted is that you can't get SMART data
from the drive. That means no monitoring for the sort of expected
failures we all know can happen with any drive. So far that looks to be
at the bottom of all the anecdotal failure reports I'd found: the
drives may have been throwing bad sectors or some other early failure,
and the owners had no idea because they thought SMART would warn
them--but it wasn't working at all. Thus, don't find out there's a
problem until the drive just dies altogether one day.

More popular doesn't always mean more reliable, but for stuff like this
it helps. Intel ships so many more drives than OCZ that I'd be shocked
if Gigabyte themselves didn't have reference samples of them for
testing. This really looks like more of a warning about why you should
be particularly aggressive with checking SMART when running recently
introduced drives, which it sounds like you are already doing.

Reliability in this area is so strange...a diversion to older drives
gives an idea how annoyed I am about all this. Last year, I gave up on
Western Digital's consumer drives (again). Not because the failure
rates were bad, but because the one failure I did run into was so
terrible from a SMART perspective. The drive just lied about the whole
problem so aggressively I couldn't manage the process. I couldn't get
the drive to admit it had a problem such that it could turn into an RMA
candidate, despite failing every time I ran an aggressive SMART error
check. It would reallocate a few sectors, say "good as new!", and then
fail at the next block when I re-tested. Did that at least a dozen
times before throwing it in the "pathological drives" pile I keep around
for torture testing.

Meanwhile, the Seagate drives I switched back to are terrible, from a
failure percentage perspective. I just had two start to go bad last
week, both halves of an array which is always fun. But, the failure
started with very clearly labeled increases in reallocated sectors, and
the drive that eventually went really bad (making the bad noises) was
kicked back for RMA. If you've got redundancy, I'll take components
that fail cleanly over ones that hide what's going on, even if the one
that fails cleanly is actually more likely to fail. With a rebuild
always a drive swap away, having accurate data makes even a higher
failure rate manageable.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-03-29 19:28:14 Re: very long updates very small tables
Previous Message Jeff 2011-03-29 16:50:58 Re: Intel SSDs that may not suck