Re: Raid 10 chunksize

From: Stef Telford <stef(at)ummon(dot)com>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-03-26 00:50:20
Message-ID: 49CAD14C.3020507@ummon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark Kirkwood wrote:
> I'm trying to pin down some performance issues with a machine where
> I work, we are seeing (read only) query response times blow out by
> an order of magnitude or more at busy times. Initially we blamed
> autovacuum, but after a tweak of the cost_delay it is *not* the
> problem. Then I looked at checkpoints... and altho there was some
> correlation with them and the query response - I'm thinking that
> the raid chunksize may well be the issue.
>
> Fortunately there is an identical DR box, so I could do a little
> testing. Details follow:
>
> Sun 4140 2x quad-core opteron 2356 16G RAM, 6x 15K 140G SAS Debian
> Lenny Pg 8.3.6
>
> The disk is laid out using software (md) raid:
>
> 4 drives raid 10 *4K* chunksize with database files (ext3 ordered,
> noatime) 2 drives raid 1 with database transaction logs (ext3
> ordered, noatime)
>
> The relevant non default .conf params are:
>
> shared_buffers = 2048MB work_mem = 4MB
> maintenance_work_mem = 1024MB max_fsm_pages = 153600
> bgwriter_lru_maxpages = 200 wal_buffers = 2MB
> checkpoint_segments = 32 effective_cache_size = 4096MB
> autovacuum_vacuum_scale_factor = 0.1 autovacuum_vacuum_cost_delay
> = 60 # This is high, but seemed to help...
>
> I've run pgbench:
>
> transaction type: TPC-B (sort of) scaling factor: 100 number of
> clients: 24 number of transactions per client: 12000 number of
> transactions actually processed: 288000/288000 tps = 655.335102
> (including connections establishing) tps = 655.423232 (excluding
> connections establishing)
>
>
> Looking at iostat while it is running shows (note sda-sdd raid10,
> sde and sdf raid 1):
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await svctm %util sda 0.00
> 56.80 0.00 579.00 0.00 2.47 8.74 133.76 235.10
> 1.73 100.00 sdb 0.00 45.60 0.00 583.60
> 0.00 2.45 8.59 52.65 90.03 1.71 100.00 sdc
> 0.00 49.00 0.00 579.80 0.00 2.45 8.66 72.56
> 125.09 1.72 100.00 sdd 0.00 58.40 0.00
> 565.00 0.00 2.42 8.79 135.31 235.52 1.77 100.00 sde
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await svctm %util sda 0.00
> 12.80 0.00 23.40 0.00 0.15 12.85 3.04 103.38
> 4.27 10.00 sdb 0.00 12.80 0.00 22.80
> 0.00 0.14 12.77 2.31 73.51 3.58 8.16 sdc
> 0.00 12.80 0.00 21.40 0.00 0.13 12.86 2.38
> 79.21 3.63 7.76 sdd 0.00 12.80 0.00 21.80
> 0.00 0.14 12.70 2.66 90.02 3.93 8.56 sde
> 0.00 2546.80 0.00 146.80 0.00 10.53 146.94 0.97
> 6.38 5.34 78.40 sdf 0.00 2546.80 0.00 146.60
> 0.00 10.53 147.05 0.97 6.38 5.53 81.04
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await svctm %util sda 0.00
> 231.40 0.00 566.80 0.00 3.16 11.41 124.92 228.26
> 1.76 99.52 sdb 0.00 223.00 0.00 558.00
> 0.00 3.06 11.23 46.64 83.55 1.70 94.88 sdc
> 0.00 230.60 0.00 551.60 0.00 3.07 11.40 94.38
> 171.54 1.76 96.96 sdd 0.00 231.40 0.00
> 528.60 0.00 2.94 11.37 122.55 220.81 1.83 96.48 sde
> 0.00 1495.80 0.00 99.00 0.00 6.23 128.86 0.81
> 8.15 7.76 76.80 sdf 0.00 1495.80 0.00 99.20
> 0.00 6.26 129.24 0.73 7.40 7.10 70.48
>
> Top looks like:
>
> Cpu(s): 2.5%us, 1.9%sy, 0.0%ni, 71.9%id, 23.4%wa, 0.2%hi,
> 0.2%si, 0.0%st Mem: 16474084k total, 15750384k used, 723700k
> free, 1654320k buffers Swap: 2104440k total, 944k used,
> 2103496k free, 13552720k cached
>
> It looks to me like we are maxing out the raid 10 array, and I
> suspect the chunksize (4K) is the culprit. However as this is a
> pest to change (!) I'd like some opinions on whether I'm jumping to
> conclusions. I'd also appreciate comments about what chunksize to
> use (I've tended to use 256K in the past, but what are folks
> preferring these days?)
>
> regards
>
> Mark
>
>
>
Hello Mark,
Okay, so, take all of this with a pinch of salt, but, I have the
same config (pretty much) as you, with checkpoint_Segments raised to
192. The 'test' database server is Q8300, 8GB ram, 2 x 7200rpm SATA
into motherboard which I then lvm stripped together; lvcreate -n
data_lv -i 2 -I 64 mylv -L 60G (expandable under lvm2). That gives me
a stripe size of 64. Running pgbench with the same scaling factors;

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 1398.907206 (including connections establishing)
tps = 1399.233785 (excluding connections establishing)

It's also running ext4dev, but, this is the 'playground' server,
not the real iron (And I dread to do that on the real iron). In short,
I think that chunksize/stripesize is killing you. Personally, I would
go for 64 or 128 .. that's jst my 2c .. feel free to
ignore/scorn/laugh as applicable ;)

Regards
Stef
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknK0UsACgkQANG7uQ+9D9VK3wCeO/guLVb4K4V7VAQ29hJsmstb
2JMAmQEmJjNTQlxng/49D2/xHNw2W19/
=/rKD
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-03-26 02:15:14 Re: Help Me Understand Why I'm Getting a Bad Query Plan
Previous Message Bryan Murphy 2009-03-25 22:14:45 Re: Help Me Understand Why I'm Getting a Bad Query Plan