Skip site navigation (1) Skip section navigation (2)

Re: RAID stripe size question

From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "Mikael Carneholm" <Mikael(dot)Carneholm(at)wirelesscar(dot)com>
Cc: "Markus Schaber" <schabi(at)logix-tt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: RAID stripe size question
Date: 2006-07-17 15:23:23
Message-ID: 33c6269f0607170823v7df531a8o678505125e85880@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 7/17/06, Mikael Carneholm <Mikael(dot)Carneholm(at)wirelesscar(dot)com> wrote:
>
> >> This is something I'd also would like to test, as a common
> >> best-practice these days is to go for a SAME (stripe all, mirror
> everything) setup.
> >> From a development perspective it's easier to use SAME as the
> >> developers won't have to think about physical location for new
> >> tables/indices, so if there's no performance penalty with SAME I'll
> >> gladly keep it that way.
>
> >Usually, it's not the developers task to care about that, but the DBAs
> responsibility.
>
> As we don't have a full-time dedicated DBA (although I'm the one who do
> most DBA related tasks) I would aim for making physical location as
> transparent as possible, otherwise I'm afraid I won't be doing anything
> else than supporting developers with that - and I *do* have other things
> to do as well :)
>
> >> In a previous test, using cd=5000 and cs=20 increased transaction
> >> throughput by ~20% so I'll definitely fiddle with that in the coming
> >> tests as well.
>
> >How many parallel transactions do you have?
>
> That was when running BenchmarkSQL
> (http://sourceforge.net/projects/benchmarksql) with 100 concurrent users
> ("terminals"), which I assume means 100 parallel transactions at most.
> The target application for this DB has 3-4 times as many concurrent
> connections so it's possible that one would have to find other cs/cd
> numbers better suited for that scenario. Tweaking bgwriter is another
> task I'll look into as well..
>
> Btw, here's the bonnie++ results from two different array sets (10+18,
> 4+24) on the MSA1500:
>
> LUN: WAL, 10 disks, stripe size 32K
> ------------------------------------
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01        32G 56139  93 73250  22 16530   3 30488  45 57489   5
> 477.3   1
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16  2458  90 +++++ +++ +++++ +++  3121  99 +++++ +++
> 10469  98
>
>
> LUN: WAL, 4 disks, stripe size 8K
> ----------------------------------
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01        32G 49170  82 60108  19 13325   2 15778  24 21489   2
> 266.4   0
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16  2432  86 +++++ +++ +++++ +++  3106  99 +++++ +++
> 10248  98
>
>
> LUN: DATA, 18 disks, stripe size 32K
> -------------------------------------
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01        32G 59990  97 87341  28 19158   4 30200  46 57556   6
> 495.4   1
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16  1640  92 +++++ +++ +++++ +++  1736  99 +++++ +++
> 10919  99
>
>
> LUN: DATA, 24 disks, stripe size 64K
> -------------------------------------
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> sesell01        32G 59443  97 118515  39 25023   5 30926  49 60835   6
> 531.8   1
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16  2499  90 +++++ +++ +++++ +++  2817  99 +++++ +++
> 10971 100



These bonnie++ number are very worrying.  Your controller should easily max
out your FC interface on these tests passing 192MB/sec with ease on anything
more than an 6 drive RAID 10 .  This is a bad omen if you want high
performance...  Each mirror pair can do 60-80MB/sec.  A 24Disk RAID 10 can
do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not
unreachable, but time and again, we see these bad perf numbers from FC and
SCSI systems alike.  Consider a different controller, because this one is
not up to snuff.  A single drive would get better numbers than your 4 disk
RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to
120Mb/sec.  If you can't swap out, software RAID may turn out to be your
friend.  The only saving grace is that this is OLTP, and perhaps, just
maybe, the controller will be better at ordering IOs, but I highly doubt it.

Please people, do the numbers, benchmark before you buy, many many HBAs
really suck under Linux/Free BSD, and you may end up paying vast sums of
money for very sub-optimal performance (I'd say sub-standard, but alas, it
seems that this kind of poor performance is tolerated, even though it's way
off where it should be).  There's no point having a 40disk cab, if your
controller can't handle it.

Maximum theoretical linear throughput can be acheived in a White Box for
under $20k, and I have seen this kind of system outperform a server 5 times
it's price even in OLTP.

Alex

In response to

pgsql-performance by date

Next:From: Chris HooverDate: 2006-07-17 20:19:14
Subject: Re: Big differences in plans between 8.0 and 8.1
Previous:From: Steinar H. GundersonDate: 2006-07-17 14:51:46
Subject: Re: RAID stripe size question

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group