Re: Raid 10 chunksize

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-03-26 04:37:00
Message-ID: 49CB066C.5040203@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Smith wrote:
> On Wed, 25 Mar 2009, Mark Kirkwood wrote:
>
>> I'm thinking that the raid chunksize may well be the issue.
>
> Why? I'm not saying you're wrong, I just don't see why that parameter
> jumped out as a likely cause here.
>

See my other post, however I agree - it wasn't clear whether split
writes (from the small chunksize) were killing us or the array was
simply maxed out...

>> Sun 4140 2x quad-core opteron 2356 16G RAM, 6x 15K 140G SAS
>
> That server doesn't have any sort of write cache on it, right? That
> means that all the fsync's done near checkpoint time are going to
> thrash your disks around. One thing you can do to improve that
> situation is push checkpoint_segments up to the maximum you can
> possibly stand. You could consider double or even quadruple what
> you're using right now, the recovery time after a crash will spike
> upwards a bunch though. That will minimize the number of checkpoints
> and reduce the average disk I/O they produce per unit of time, due to
> how they're spread out in 8.3. You might bump upwards
> checkpoint_completion_target to 0.9 in order to get some improvement
> without increasing recovery time as badly.
>

Yeah, no write cache at all.

> Also, if you want to minimize total I/O, you might drop
> bgwriter_lru_maxpages to 0. That feature presumes you have some spare
> I/O capacity you use to prioritize lower latency, and it sounds like
> you don't. You get the lowest total I/O per transaction with the
> background writer turned off.
>

Right - but then a big very noticeable stall when you do have to
checkpoint? We want to avoid that I think, even at the cost of a little
overall throughput.

> You happened to catch me on a night where I was running some pgbench
> tests here, so I can give you something similar to compare against.
> Quad-core system, 8GB of RAM, write-caching controller with 3-disk
> RAID0 for database and 1 disk for WAL; Linux software RAID though.
> Here's the same data you collected at the same scale you're testing,
> with similar postgresql.conf settings too (same shared_buffers and
> checkpoint_segments, I didn't touch any of the vacuum parameters):
>
> number of clients: 32
> number of transactions per client: 6250
> number of transactions actually processed: 200000/200000
> tps = 1097.933319 (including connections establishing)
> tps = 1098.372510 (excluding connections establishing)
>
> Cpu(s): 3.6%us, 1.0%sy, 0.0%ni, 57.2%id, 37.5%wa, 0.0%hi,
> 0.7%si, 0.0%st
> Mem: 8174288k total, 5545396k used, 2628892k free, 473248k buffers
> Swap: 0k total, 0k used, 0k free, 4050736k cached
>
> sda,b,d are the database, sdc is the WAL, here's a couple of busy
> periods:
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await svctm %util
> sda 0.00 337.26 0.00 380.72 0.00 2.83
> 15.24 104.98 278.77 2.46 93.55
> sdb 0.00 343.56 0.00 386.31 0.00 2.86
> 15.17 91.32 236.61 2.46 94.95
> sdd 0.00 342.86 0.00 391.71 0.00 2.92
> 15.28 128.36 342.42 2.43 95.14
> sdc 0.00 808.89 0.00 45.45 0.00 3.35
> 150.72 1.22 26.75 21.13 96.02
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await svctm %util
> sda 0.00 377.82 0.00 423.38 0.00 3.13
> 15.12 74.24 175.21 1.41 59.58
> sdb 0.00 371.73 0.00 423.18 0.00 3.13
> 15.15 50.61 119.81 1.41 59.58
> sdd 0.00 372.93 0.00 414.99 0.00 3.06
> 15.09 60.02 144.32 1.44 59.70
> sdc 0.00 3242.16 0.00 258.84 0.00 13.68
> 108.23 0.88 3.42 2.96 76.60
>
> They don't really look much different from yours. I'm using software
> RAID and haven't touched any of its parameters; didn't even use
> noatime on the ext3 filesystems (you should though--that's one of
> those things the write cache really helps out with in my case).
>
Yeah - with 64K chunksize I'm seeing a result more congruent with yours
(866 or so for 24 clients), I think another pair of disks so we could
have 3 effective disks for the database would help get us to similar
results to yours... however for the meantime I'm trying to get the best
out of what's there!

Thanks for your help

Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2009-03-26 04:43:10 Re: Raid 10 chunksize
Previous Message Mark Kirkwood 2009-03-26 04:28:46 Re: Raid 10 chunksize