Re: Intel SSDs that may not suck

From: "Strange, John W" <john(dot)w(dot)strange(at)jpmchase(dot)com>
To: Jeff <threshar(at)torgo(dot)978(dot)org>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andy <angelflow(at)yahoo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Brian Ristuccia <brian(at)ristuccia(dot)com>
Subject: Re: Intel SSDs that may not suck
Date: 2011-03-29 15:32:16
Message-ID: EF37296944B47C40ADDCCB7BFD6289FE04AE9D65C9@EMASC201VS01.exchad.jpmchase.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This can be resolved by partitioning the disk with a larger write spare area so that the cells don't have to by recycled so often. There is a lot of "misinformation" about SSD's, there are some great articles on anandtech that really explain how the technology works and some of the differences between the controllers as well. If you do the reading you can find a solution that will work for you, SSD's are probably one of the best technologies to come along for us in a long time that gives us such a performance jump in the IO world. We have gone from completely IO bound to CPU bound, it's really worth spending the time to investigate and understand how this can impact your system.

http://www.anandtech.com/show/2614
http://www.anandtech.com/show/2738
http://www.anandtech.com/show/4244/intel-ssd-320-review
http://www.anandtech.com/tag/storage
http://www.anandtech.com/show/3849/micron-announces-realssd-p300-slc-ssd-for-enterprise

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Jeff
Sent: Tuesday, March 29, 2011 9:33 AM
To: Jeff
Cc: Merlin Moncure; Andy; pgsql-performance(at)postgresql(dot)org; Greg Smith; Brian Ristuccia
Subject: Re: [PERFORM] Intel SSDs that may not suck

On Mar 29, 2011, at 10:16 AM, Jeff wrote:

> Now that all sounds awful and horrible until you get to overall
> performance, especially with reads - you are looking at 20k random
> reads per second with a few disks. Adding in writes does kick it down
> a noch, but you're still looking at 10k+ iops. That is the current
> trade off.
>

We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of what it is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing) out, using my pgiosim tool which I need to update on pgfoundry)

10:25:24 AM dev104-2 7652.21 109734.51 12375.22 15.96
8.22 1.07 0.12 88.32
10:25:25 AM dev104-2 7318.52 104948.15 11696.30 15.94
8.62 1.17 0.13 92.50
10:25:26 AM dev104-2 7871.56 112572.48 13034.86 15.96
8.60 1.09 0.12 91.38
10:25:27 AM dev104-2 7869.72 111955.96 13592.66 15.95
8.65 1.10 0.12 91.65
10:25:28 AM dev104-2 7859.41 111920.79 13560.40 15.97
9.32 1.19 0.13 98.91
10:25:29 AM dev104-2 7285.19 104133.33 12000.00 15.94
8.08 1.11 0.13 92.59
10:25:30 AM dev104-2 8017.27 114581.82 13250.91 15.94
8.48 1.06 0.11 90.36
10:25:31 AM dev104-2 8392.45 120030.19 13924.53 15.96
8.90 1.06 0.11 94.34
10:25:32 AM dev104-2 10173.86 145836.36 16409.09 15.95
10.72 1.05 0.11 113.52
10:25:33 AM dev104-2 7007.14 100107.94 11688.89 15.95
7.39 1.06 0.11 79.29
10:25:34 AM dev104-2 8043.27 115076.92 13192.31 15.95
9.09 1.13 0.12 96.15
10:25:35 AM dev104-2 7409.09 104290.91 13774.55 15.94
8.62 1.16 0.12 90.55

the 2nd to last column is svctime. first column after dev104-2 is TPS. if I kill the writes off, tps rises quite a bit:
10:26:34 AM dev104-2 22659.41 361528.71 0.00 15.95
10.57 0.42 0.04 99.01
10:26:35 AM dev104-2 22479.41 359184.31 7.84 15.98
9.61 0.52 0.04 98.04
10:26:36 AM dev104-2 21734.29 347230.48 0.00 15.98
9.30 0.43 0.04 95.33
10:26:37 AM dev104-2 21551.46 344023.30 116.50 15.97
9.56 0.44 0.05 97.09
10:26:38 AM dev104-2 21964.42 350592.31 0.00 15.96
10.25 0.42 0.04 96.15
10:26:39 AM dev104-2 22512.75 359294.12 7.84 15.96
10.23 0.50 0.04 98.04
10:26:40 AM dev104-2 22373.53 357725.49 0.00 15.99
9.52 0.43 0.04 98.04
10:26:41 AM dev104-2 21436.79 342596.23 0.00 15.98
9.17 0.43 0.04 94.34
10:26:42 AM dev104-2 22525.49 359749.02 39.22 15.97
10.18 0.45 0.04 98.04

now to demonstrate "write stalls" on the problemtic box:
10:30:49 AM dev104-3 0.00 0.00 0.00 0.00
0.38 0.00 0.00 35.85
10:30:50 AM dev104-3 3.03 8.08 258.59 88.00
2.43 635.00 333.33 101.01
10:30:51 AM dev104-3 4.00 0.00 128.00 32.00
0.67 391.75 92.75 37.10
10:30:52 AM dev104-3 10.89 0.00 95.05 8.73
1.45 133.55 12.27 13.37
10:30:53 AM dev104-3 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
10:30:54 AM dev104-3 155.00 0.00 1488.00 9.60
10.88 70.23 2.92 45.20
10:30:55 AM dev104-3 10.00 0.00 536.00 53.60
1.66 100.20 45.80 45.80
10:30:56 AM dev104-3 46.53 0.00 411.88 8.85
3.01 78.51 4.30 20.00
10:30:57 AM dev104-3 11.00 0.00 96.00 8.73
0.79 72.91 27.00 29.70
10:30:58 AM dev104-3 12.00 0.00 96.00 8.00
0.79 65.42 11.17 13.40
10:30:59 AM dev104-3 7.84 7.84 62.75 9.00
0.67 85.38 32.00 25.10
10:31:00 AM dev104-3 8.00 0.00 224.00 28.00
0.82 102.00 47.12 37.70
10:31:01 AM dev104-3 20.00 0.00 184.00 9.20
0.24 11.80 1.10 2.20
10:31:02 AM dev104-3 4.95 0.00 39.60 8.00
0.23 46.00 13.00 6.44
10:31:03 AM dev104-3 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

that was from a simple dd, not random writes. (since it is in production, I can't really do the random write test as easily)

theoretically, a nice rotation of disks would remove that problem.
annoying, but it is the price you need to pay

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jesper Krogh 2011-03-29 16:12:25 Re: Intel SSDs that may not suck
Previous Message Lars Feistner 2011-03-29 14:38:50 very long updates very small tables