Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com>
Cc: Matthew Wakeling <matthew(at)flymine(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
Date: 2009-02-18 19:26:58
Message-ID: C5C1A102.2857%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 2/17/09 11:52 PM, "Rajesh Kumar Mallah" <mallah(dot)rajesh(at)gmail(dot)com> wrote:

the raid10 voulme was benchmarked again
taking in consideration above points

Effect of ReadAhead Settings
disabled,256(default) , 512,1024

xfs_ra0 414741 , 66144
xfs_ra256 403647, 545026 all tests on sda6
xfs_ra512 411357, 564769
xfs_ra1024 404392, 431168

looks like 512 was the best setting for this controller

Try 4096 or 8192 (or just to see, 32768), you should get numbers very close to a raw partition with xfs with a sufficient readahead value. It is controller dependant for sure, but I usually see a "small peak" in performance at 512 or 1024, followed by a dip, then a larger peak and plateau at somewhere near # of drives * the small peak. The higher quality the controller, the less you need to fiddle with this.
I use a script that runs fio benchmarks with the following profiles with readahead values from 128 to 65536. The single reader STR test peaks with a smaller readahead value than the concurrent reader one (2 ot 8 concurrent sequential readers) and the mixed random/sequential read loads become more biased to sequential transfer (and thus, higher overall throughput in bytes/sec) with larger readahead values. The choice between the cfq and deadline scheduler however will affect the priority of random vs sequential reads more than the readahead - cfq favoring random access due to dividing I/O by time slice.

The FIO profiles I use for benchmarking are at the end of this message.

Considering these two figures
xfs25 350661, 474481 (/dev/sda7)
25xfs 404291 , 547672 (/dev/sda6)

looks like the beginning of the drives are 15% faster
than the ending sections , considering this is it worth
creating a special tablespace at the begining of drives

For SAS drives, its typically a ~15% to 25% degradation (the last 5% is definitely slow). For SATA 3.5" drives the last 5% is 50% the STR as the front.
Graphs about half way down this page show what it looks like for a typical SATA drive: http://www.tomshardware.com/reviews/Seagate-Barracuda-1-5-TB,2032-5.html
And a couple figures for some SAS drives here http://www.storagereview.com/ST973451SS.sr?page=0%2C1

>
> If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra
> /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). >How large of an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over time and requires less tuning. But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID.
> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or >2048 * the number of effective spindles. The value is in blocks (512 bytes). There is documentation on the >blockdev command, and here is a little write-up I found with a couple web searches:
>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command

FIO benchmark profile examples (long, posting here for the archives):

*Read benchmarks, sequential:

[read-seq]
; one sequential reader reading one 64g file
rw=read
size=64g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-seq]
; two sequential readers, each concurrently reading a 32g file, for a total of 64g max
rw=read
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=2
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-seq]
; eight sequential readers, each concurrently reading a 8g file, for a total of 64g max
rw=read
size=8g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches

*Read benchmarks, random 8k reads.

[read-rand]
; random access on 2g file by single reader, best case scenario.
rw=randread
size=2g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-rand]
; 8 concurrent random readers each to its own 1g file
rw=randread
size=1g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches

*Mixed Load:

[global]
; one random reader concurrently with one sequential reader.
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[seq-read]
rw=read
size=64g
numjobs=1
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=1
nrfiles=1

[global]
; Four sequential readers concurrent with four random readers
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
rw=read
size=8g
numjobs=4
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=4
nrfiles=1

*Write tests

[write-seq]
rw=write
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
end_fsync=1

[write-rand]
rw=randwrite
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
; overwrite= 1 is MANDATORY for xfs, otherwise the writes are sparse random writes and can slow performance to near zero. Postgres only does random re-writes, never sparse random writes.
overwrite=1
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
end_fsync=1;

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rajesh Kumar Mallah 2009-02-18 20:14:44 Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller
Previous Message Scott Carey 2009-02-18 19:23:06 Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller