Re: New server: SSD/RAID recommendations?

From: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
To: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
Cc: Vitalii Tymchyshyn <vit(at)tym(dot)im>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "Wes Vaske (wvaske)" <wvaske(at)micron(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: New server: SSD/RAID recommendations?
Date: 2015-07-07 16:58:49
Message-ID: E3ACE58A-4E08-4D32-845A-00352834B1F7@skogoglandskap.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> This raises another interesting question. Does anyone hear have a document explaining how their BBU cache works EXACTLY (at cache / sata level) on their server? Because I haven't been able to find any for mine (Dell PERC H710/H710P). Can anyone tell me with godlike authority and precision, what exactly happens inside that BBU post-power failure?

(and if you have that manual - how can you know it's accurate? that the implementation matches the manual and is free of bugs? because my M500s didn't match the packaging and neither did a H710 we bought - Dell had advertised features in some marketing material that were only present on the H710P)

And I see UBER (unrecoverable bit error) rates for SSDs and HDDs, but has anyone ever seen them for the flash-based cache on their raid controller?

Sleep well, friends.

Graeme.

On 07 Jul 2015, at 18:54, Graeme B. Bell <graeme(dot)bell(at)nibio(dot)no> wrote:

>
> That is a very good question, which I have raised elsewhere on the postgresql lists previously.
>
> In practice: I have *never* managed to make diskchecker fail with the BBU enabled in front of the drives and I spent days trying with plug pulls till I reached the point where as a statistical event it just can't be that likely at all. That's not to say it can't ever happen, just that I've taken all reasonable measures that I can to find out on the time and money budget I had available.
>
> In theory: It may be the fact the BBU makes the drives run at about half speed, so that the capacitors go a good bit further to empty the cache, after all: without the BBU in the way, the drive manages to save everything but the last fragment of writes. But I also suspect that the controller itself maybe replaying the last set of writes from around the time of power loss.
>
> Anyway I'm 50/50 on those two explanations. Any other thoughts welcome.
>
> This raises another interesting question. Does anyone hear have a document explaining how their BBU cache works EXACTLY (at cache / sata level) on their server? Because I haven't been able to find any for mine (Dell PERC H710/H710P). Can anyone tell me with godlike authority and precision, what exactly happens inside that BBU post-power failure?
>
> There is rather too much magic involved for me to be happy.
>
> G
>
> On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
>
>> Hi.
>>
>> How would BBU cache help you if it lies about fsync? I suppose any RAID controller removes data from BBU cache after it was fsynced by the drive. As I know, there is no other "magic command" for drive to tell controller that the data is safe now and can be removed from BBU cache.
>>
>> Вт, 7 лип. 2015 11:59 Graeme B. Bell <graeme(dot)bell(at)nibio(dot)no> пише:
>>
>> Yikes. I would not be able to sleep tonight if it were not for the BBU cache in front of these disks...
>>
>> diskchecker.pl consistently reported several examples of corruption post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's pretty much open to debate what types of madness and corruption you'll find if you look close enough.
>>
>> G
>>
>>
>> On 07 Jul 2015, at 16:59, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>
>>>
>>> So it lies about fsync()... The next question is, does it nevertheless enforce the correct ordering of persisting fsync'd data? If you write to file A and fsync it, then write to another file B and fsync it too, is it guaranteed that if B is persisted, A is as well? Because if it isn't, you can end up with filesystem (or database) corruption anyway.
>>>
>>> - Heikki
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Karl Denninger 2015-07-07 17:05:49 Re: New server: SSD/RAID recommendations?
Previous Message Wei Shan 2015-07-07 16:56:47 Re: New server: SSD/RAID recommendations?