Re: RAID stripe size question

From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "Mikael Carneholm" <Mikael(dot)Carneholm(at)wirelesscar(dot)com>
Cc: "Markus Schaber" <schabi(at)logix-tt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: RAID stripe size question
Date: 2006-07-17 15:23:23
Message-ID: 33c6269f0607170823v7df531a8o678505125e85880@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 7/17/06, Mikael Carneholm <Mikael(dot)Carneholm(at)wirelesscar(dot)com> wrote:
>
> >> This is something I'd also would like to test, as a common
> >> best-practice these days is to go for a SAME (stripe all, mirror
> everything) setup.
> >> From a development perspective it's easier to use SAME as the
> >> developers won't have to think about physical location for new
> >> tables/indices, so if there's no performance penalty with SAME I'll
> >> gladly keep it that way.
>
> >Usually, it's not the developers task to care about that, but the DBAs
> responsibility.
>
> As we don't have a full-time dedicated DBA (although I'm the one who do
> most DBA related tasks) I would aim for making physical location as
> transparent as possible, otherwise I'm afraid I won't be doing anything
> else than supporting developers with that - and I *do* have other things
> to do as well :)
>
> >> In a previous test, using cd=5000 and cs=20 increased transaction
> >> throughput by ~20% so I'll definitely fiddle with that in the coming
> >> tests as well.
>
> >How many parallel transactions do you have?
>
> That was when running BenchmarkSQL
> (http://sourceforge.net/projects/benchmarksql) with 100 concurrent users
> ("terminals"), which I assume means 100 parallel transactions at most.
> The target application for this DB has 3-4 times as many concurrent
> connections so it's possible that one would have to find other cs/cd
> numbers better suited for that scenario. Tweaking bgwriter is another
> task I'll look into as well..
>
> Btw, here's the bonnie++ results from two different array sets (10+18,
> 4+24) on the MSA1500:
>
> LUN: WAL, 10 disks, stripe size 32K
> ------------------------------------
> Version 1.03 ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01 32G 56139 93 73250 22 16530 3 30488 45 57489 5
> 477.3 1
> ------Sequential Create------ --------Random
> Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 2458 90 +++++ +++ +++++ +++ 3121 99 +++++ +++
> 10469 98
>
>
> LUN: WAL, 4 disks, stripe size 8K
> ----------------------------------
> Version 1.03 ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01 32G 49170 82 60108 19 13325 2 15778 24 21489 2
> 266.4 0
> ------Sequential Create------ --------Random
> Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 2432 86 +++++ +++ +++++ +++ 3106 99 +++++ +++
> 10248 98
>
>
> LUN: DATA, 18 disks, stripe size 32K
> -------------------------------------
> Version 1.03 ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01 32G 59990 97 87341 28 19158 4 30200 46 57556 6
> 495.4 1
> ------Sequential Create------ --------Random
> Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 1640 92 +++++ +++ +++++ +++ 1736 99 +++++ +++
> 10919 99
>
>
> LUN: DATA, 24 disks, stripe size 64K
> -------------------------------------
> Version 1.03 ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01 32G 59443 97 118515 39 25023 5 30926 49 60835 6
> 531.8 1
> ------Sequential Create------ --------Random
> Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 2499 90 +++++ +++ +++++ +++ 2817 99 +++++ +++
> 10971 100

These bonnie++ number are very worrying. Your controller should easily max
out your FC interface on these tests passing 192MB/sec with ease on anything
more than an 6 drive RAID 10 . This is a bad omen if you want high
performance... Each mirror pair can do 60-80MB/sec. A 24Disk RAID 10 can
do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not
unreachable, but time and again, we see these bad perf numbers from FC and
SCSI systems alike. Consider a different controller, because this one is
not up to snuff. A single drive would get better numbers than your 4 disk
RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to
120Mb/sec. If you can't swap out, software RAID may turn out to be your
friend. The only saving grace is that this is OLTP, and perhaps, just
maybe, the controller will be better at ordering IOs, but I highly doubt it.

Please people, do the numbers, benchmark before you buy, many many HBAs
really suck under Linux/Free BSD, and you may end up paying vast sums of
money for very sub-optimal performance (I'd say sub-standard, but alas, it
seems that this kind of poor performance is tolerated, even though it's way
off where it should be). There's no point having a 40disk cab, if your
controller can't handle it.

Maximum theoretical linear throughput can be acheived in a White Box for
under $20k, and I have seen this kind of system outperform a server 5 times
it's price even in OLTP.

Alex

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Hoover 2006-07-17 20:19:14 Re: Big differences in plans between 8.0 and 8.1
Previous Message Steinar H. Gunderson 2006-07-17 14:51:46 Re: RAID stripe size question