Skip site navigation (1) Skip section navigation (2)

Re: High CPU Utilization

From: Joe Uhl <joeuhl(at)gmail(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: High CPU Utilization
Date: 2009-03-16 20:50:04
Message-ID: F161EF4F-4B72-47AC-A894-C1C1BE7B7E1A@gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
I dropped the pool sizes and brought things back up.  Things are  
stable, site is fast, CPU utilization is still high.  Probably just a  
matter of time before issue comes back (we get slammed as kids get out  
of school in the US).

Now when I run vmtstat 1 30 it looks very different (below).  Waiting  
is minimal, user is very high.  Under nontrivial load, according to  
xact_commit in pg_stat_database we are doing 1800+ tps.

Appreciate the input and explanation on vmstat.  I am going to throw  
some of these numbers into zabbix so I can keep a better eye on them.   
This server is a couple years old so the purchase of a new controller  
and/or disks is not out of the question.

On final note, have made several changes to postgresql.conf.  Some of  
those here:
max_connections = 1000
shared_buffers = 7680MB
work_mem = 30MB
synchronous_commit = off
checkpoint_segments = 50
effective_cache_size = 20000MB

procs -----------memory---------- ---swap-- -----io---- -system-- ---- 
cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us  
sy id wa
  9  8  73036 500164  82200 23497748    3    4   669   541    1    1  
23  3 54 19
20  4  73036 497252  82200 23500836    0    0  2500   680 11145 15168  
91  4  2  2
21  1  73036 491416  82204 23503832    0    0  1916   920 10303 14032  
94  4  1  1
23  5  73036 489580  82212 23505860    0    0  1348  3296 11682 15970  
94  5  1  0
31  1  73036 481408  82220 23507752    0    0   984  8988 10123 11289  
97  3  0  0
25  4  73036 483248  82232 23509420    0    0  1268  1312 10705 14063  
96  4  0  0
23  4  73036 480096  82232 23512380    0    0  2372   472 9805 13996  
94  5  1  1
24  4  73036 476732  82236 23515196    0    0  2012   720 10365 14307  
96  3  1  0
22  1  73036 474468  82236 23516584    0    0   944  3108 9838 12831  
95  4  1  0
14  1  73036 455756  82284 23534548    0    0   908  3284 9096 11333  
94  4  1  0
10  2  73036 455224  82292 23536304    0    0  1760   416 12454 17736  
89  6  3  2
17  0  73036 460620  82292 23538888    0    0  1292   968 12030 18333  
90  7  2  1
13  4  73036 459764  82292 23539724    0    0   332   288 9722 14197  
92  5  2  1
17  5  73036 457516  82292 23542176    0    0  1872 17752 10458 15465  
91  5  2  1
19  4  73036 450804  82300 23545640    0    0  2980   640 10602 15621  
90  6  2  2
24  0  73036 447660  82312 23547644    0    0  1736 10724 12401 15413  
93  6  1  0
20  6  73036 444380  82320 23550692    0    0  2064   476 9008 10985  
94  4  1  0
22  2  73036 442880  82328 23553640    0    0  2496  3156 10739 15211  
93  5  1  1
11  1  73036 441448  82328 23555632    0    0  1452  3552 10812 15337  
93  5  2  1
  6  2  73036 439812  82348 23557420    0    0  1052  1128 8603 10514  
91  3  3  2
  6  3  73036 433456  82348 23560860    0    0  2484   656 7636 13033  
68  4 14 14
  6  3  73036 433084  82348 23562628    0    0  1400   408 6046 11778  
70  3 18  9
  5  0  73036 430776  82356 23564264    0    0  1108  1300 7549 13754  
73  4 16  7
  5  2  73036 430124  82360 23565580    0    0  1016  2216 7844 14507  
72  4 18  6
  4  2  73036 429652  82380 23567480    0    0  1168  2468 7694 15466  
58  4 24 14
  6  2  73036 427304  82384 23569668    0    0  1132   752 5993 13606  
49  5 36 10
  7  1  73036 423020  82384 23571932    0    0  1244   824 8085 18072  
56  3 30 10
procs -----------memory---------- ---swap-- -----io---- -system-- ---- 
cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us  
sy id wa
  4  0  73036 420816  82392 23573824    0    0  1292   820 5370 10958  
46  2 41 10
  9  1  73020 418048  82392 23576900   52    0  1632  2592 5931 11629  
60  3 29  8
  4  2  73004 415164  82424 23578620   56    0  1812  4116 7503 14674  
71  3 15 12

On Mar 16, 2009, at 4:19 PM, Dave Youatt wrote:
> Last column "wa" is % cpu time spent waiting (for IO to complete).   
> 80s
> and 90s is pretty high, probably too high.
>
> Might also want to measure the IO/s performance of your RAID
> controller.  From the descriptions, it will be much more important  
> that
> long sequential reads/writes for characterizing your workload.
>
> There are also some disappointing HW RAID controllers out there.
> Generally, Aretec and Promise are good, Adaptec good, depending on
> model, and the ones that Dell ship w/their servers haven't had good
> reviews/reports.
>
>
> On 03/16/2009 01:10 PM, Joe Uhl wrote:
>> Here is vmstat 1 30.  We are under peak load right now so I can  
>> gather
>> information from the real deal :)
>>
>> Had an almost complete lockup a moment ago, number of non-idle
>> postgres connections was 637.  Going to drop our JDBC pool sizes a  
>> bit
>> and bounce everything.
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us  
>> sy
>> id wa
>> 12 35  95056 11102380  56856 14954948    3    4   669   541    1    2
>> 23  3 54 19
>> 12 39  95056 11092484  56876 14963204    0    0  6740  1204 10066
>> 13277 91  5  0  4
>> 8 42  95056 11081712  56888 14972244    0    0  8620  1168 10659  
>> 17020
>> 78  6  0 15
>> 10 30  95052 11069768  56904 14982628    0    0  8944   976 9809  
>> 15109
>> 81  6  1 12
>> 4 27  95048 11059576  56916 14991296    0    0  8852   440 7652 13294
>> 63  4  2 32
>> 5 42  95048 11052524  56932 14996496    0    0  4700   384 6383 11249
>> 64  4  4 28
>> 5 33  95048 11047492  56956 15001428    0    0  3852   572 6029 14010
>> 36  4  5 56
>> 7 35  95048 11041184  56960 15005480    0    0  3964   136 5042 10802
>> 40  3  1 56
>> 1 33  95048 11037988  56968 15009240    0    0  3892   168 3384 6479
>> 26  1  3 69
>> 3 28  95048 11029332  56980 15015744    0    0  6724   152 4964 12844
>> 11  2  8 79
>> 0 34  95048 11025880  56988 15020168    0    0  3852   160 3616 8614
>> 11  1  6 82
>> 3 25  95048 10996356  57044 15044796    0    0  7892   456 3126 7115
>> 4  3  8 85
>> 1 26  95048 10991692  57052 15050100    0    0  5188   176 2566 5976
>> 3  2 12 83
>> 0 29  95048 10985408  57060 15054968    0    0  4200    80 2586 6582
>> 4  1 12 83
>> 1 29  95048 10980828  57064 15058992    0    0  4560    64 2966 7557
>> 7  2  6 85
>> 2 28  95048 10977192  57072 15063176    0    0  3860    72 2695 6742
>> 11  1  7 81
>> 2 29  95048 10969120  57088 15067808    0    0  5084    84 3296 8067
>> 14  1  0 84
>> 0 25  95048 10962096  57104 15072984    0    0  4440   500 2721 6263
>> 12  1  6 80
>> 0 23  95044 10955320  57108 15079260    0    0  5712   232 2678 5990
>> 6  1  6 87
>> 2 25  95044 10948644  57120 15084524    0    0  5120   184 3499 8143
>> 20  3  9 69
>> 3 21  95044 10939744  57128 15090644    0    0  5756   264 4724 10272
>> 32  3  5 60
>> 1 19  95040 10933196  57144 15095024   12    0  4440   180 2585 5244
>> 13  2 15 70
>> 0 21  95040 10927596  57148 15098684    0    0  3248   136 2973 7292
>> 8  1  9 81
>> 1 20  95040 10920708  57164 15104244    0    0  5192   360 1865 4547
>> 3  1  9 87
>> 1 24  95040 10914552  57172 15105856    0    0  2308    16 1948 4450
>> 6  1  1 93
>> 0 24  95036 10909148  57176 15110240    0    0  3824   152 1330 2632
>> 3  1  6 90
>> 1 21  95036 10900628  57192 15116332    0    0  5680   180 1898 3986
>> 4  1 11 84
>> 0 19  95036 10888356  57200 15121736    0    0  5952   120 2252 3991
>> 12  1  8 79
>> 2 22  95036 10874336  57204 15128252    0    0  6320   112 2831 6755
>> 5  2  8 85
>> 3 26  95036 10857592  57220 15134020    0    0  5124   216 3067 5296
>> 32  6  3 59
>>
>> Alan, my apologies if you get this twice.  Didn't reply back to the
>> list on first try.
>>
>> On Mar 16, 2009, at 3:52 PM, Alan Hodgson wrote:
>>
>>> On Monday 16 March 2009, Joe Uhl <joeuhl(at)gmail(dot)com> wrote:
>>>> Right now (not under peak load) this server is running at 68% CPU
>>>> utilization and its SATA raid 10 is doing about 2MB/s writes and  
>>>> 11MB/
>>>> s reads.  When I run dd I can hit 200+MB/s writes and 230+ MB/s  
>>>> reads,
>>>> so we are barely using the available IO.  Further when I run dd the
>>>> CPU utilization of that process only approaches 20%-30% of one  
>>>> core.
>>>
>>> What does vmstat say when it's slow? The output of "vmstat 1 30"
>>> would be
>>> informative.
>>>
>>> note: dd is sequential I/O. Normal database usage is random I/O.
>>>
>>> -- 
>>> Even a sixth-grader can figure out that you can’t borrow money to  
>>> pay
>>> off
>>> your debt
>>>
>>> -- 
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
>

On Mar 16, 2009, at 4:35 PM, Greg Smith wrote:

> On Mon, 16 Mar 2009, Joe Uhl wrote:
>
>> Here is vmstat 1 30.  We are under peak load right now so I can  
>> gather information from the real deal
>
> Quite helpful, reformatting a bit and picking an informative section:
>
> procs -----------memory----------    ---swap- ----io--- -system--  
> ----cpu----
> r  b   swpd     free   buff  cache   si   so   bi    bo   in   cs us  
> sy id wa
> 0 34  95048 11025880  56988 15020168 0    0  3852   160 3616 8614  
> 11  1  6 82
> 3 25  95048 10996356  57044 15044796 0    0  7892   456 3126 7115   
> 4  3  8 85
> 1 26  95048 10991692  57052 15050100 0    0  5188   176 2566 5976   
> 3  2 12 83
>
> This says that your server is spending all its time waiting for I/O,  
> actual CPU utilization is minimal.  You're only achieving around  
> 3-8MB/s of random I/O.  That's the reality of what your disk I/O  
> subsystem is capable of, regardless of what its sequential  
> performance with dd looks like.  If you were to run a more  
> complicated benchmark like bonnie++ instead, I'd bet that your  
> "seeks/second" results are very low, even though sequential read/ 
> write is fine.
>
> The Perc5 controllers have a pretty bad reputation for performance  
> on this list, even in RAID10.  Not much you can do about that beyond  
> scrapping the controller and getting a better one.
>
> What you might do in order to reduce the total number of writes  
> needed is some standard postgresql.conf tuning; see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
> What you could do here is increase shared_buffers,  
> checkpoint_segments, and checkpoint_completion_target as described  
> there.  Having more buffers dedicated to the database and having  
> less checkpoints can result in less random I/O showing up, as  
> popular data pages will stay in RAM for longer without getting  
> written out so much.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com  
> Baltimore, MD


In response to

Responses

pgsql-performance by date

Next:From: Scott MarloweDate: 2009-03-16 22:09:13
Subject: Re: High CPU Utilization
Previous:From: Greg SmithDate: 2009-03-16 20:35:27
Subject: Re: High CPU Utilization

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group