Re: New server: SSD/RAID recommendations?

From: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: "Wes Vaske (wvaske)" <wvaske(at)micron(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: New server: SSD/RAID recommendations?
Date: 2015-07-07 10:22:00
Message-ID: 52A08B87-2A8A-43F2-92CE-2514F5AD141B@skogoglandskap.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Completely agree with Steve.

1. Intel NVMe looks like the best bet if you have modern enough hardware for NVMe. Otherwise e.g. S3700 mentioned elsewhere.

2. RAID controllers.

We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
This might give people idea about where the risk lies in the path from disk to CPU.

We've had 2 RAID card failures in the last 12 months that nuked the array with days of downtime, and 2 problems with batteries suddenly becoming useless or suddenly reporting wildly varying temperatures/overheating. There may have been other RAID problems I don't know about.

Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.

So by my estimates:
30% annual problem rate with RAID controllers
30-50% failure rate with Seagate HDDs (backblaze saw similar results)
0% failure rate with HGST HDDs.
0% failure in our SSDs. (to be fair, our one samsung SSD apparently has a bug in TRIM under linux, which I'll need to investigate to see if we have been affected by).

also, RAID controllers aren't free - not just the money but also the management of them (ever tried writing a complex install script that interacts work with MegaCLI? It can be done but it's not much fun.). Just take a look at the MegaCLI manual and ask yourself... is this even worth it (if you have a good MTBF on an enterprise SSD).

RAID was meant to be about ensuring availability of data. I have trouble believing that these days....

Graeme Bell

On 06 Jul 2015, at 18:56, Steve Crawford <scrawford(at)pinpointresearch(dot)com> wrote:

>
> 2. We don't typically have redundant electronic components in our servers. Sure, we have dual power supplies and dual NICs (though generally to handle external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no backup RAID card. Intel Enterprise SSD already have power-fail protection so I don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD I'm left to wonder if placing a RAID card in front merely adds a new point of failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at you, RAID backup battery).

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mkrtchyan, Tigran 2015-07-07 10:28:18 Re: New server: SSD/RAID recommendations?
Previous Message Mkrtchyan, Tigran 2015-07-06 21:14:31 Re: 9.5alpha1 vs 9.4