Re: Proposal of tunable fix for scalability of 8.4

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-13 16:42:30
Message-ID: 49BA8CF6.6090604@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Now with a modified Fix (not the original one that I proposed but
something that works like a heart valve : Opens and shuts to minimum
default way thus controlling how many waiters are waked up )

Time:Users:throughput: Reponse
60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007
1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010
1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012
1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013
1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015
1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016
1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017
1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017
1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018
1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019
1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019
1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020
1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022
2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022
2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024
2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024
2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026
2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027
2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029
2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029
2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028
2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029
2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029
2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029
2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031

With final vmstats improving but still far from 100%
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us
sy id
38 0 0 46052840 39345096 0 11 0 0 0 0 0 0 0 0 0 134137 290703
303518 40 14 45
43 0 0 45656456 38882912 23 77 0 0 0 0 0 0 0 0 0 135820 272899
300749 40 15 45
38 0 0 45650488 38816984 23 80 0 0 0 0 0 0 0 0 0 135009 272767
300192 39 15 46
47 0 0 46020792 39187688 0 5 0 0 0 0 0 0 0 0 0 140473 285445
312826 40 14 46
24 0 0 46143984 39326848 9 61 0 0 0 0 0 0 0 0 0 146194 308590
328241 40 15 45
37 0 0 45465256 38757000 22 74 0 0 0 0 0 0 0 0 0 136835 293971
301433 38 14 48
35 0 0 46017544 39308072 12 61 0 0 0 0 0 0 0 0 0 142749 312355
320592 42 15 43
36 0 0 45456000 38744688 11 24 0 0 0 0 0 0 0 0 0 143566 303461
317683 41 15 43
23 0 0 46007408 39291312 2 22 0 0 0 0 0 0 0 0 0 140246 300061
316663 42 15 43
20 0 0 46029656 39281704 10 25 0 0 0 0 0 0 0 0 0 147787 291825
326387 43 15 42
24 0 0 46131016 39288528 2 21 0 0 0 0 0 0 0 0 0 150796 310697
335791 43 15 42
20 0 0 46109448 39269392 16 67 0 0 0 0 0 0 0 0 0 150075 315517
332881 43 16 41
30 0 0 45540928 38710376 9 27 0 0 0 0 0 0 0 0 0 155214 316448
341472 43 16 40
14 0 0 45987496 39270016 0 5 0 0 0 0 0 0 0 0 0 155028 333711
344207 44 16 40
25 0 0 45981136 39263008 0 10 0 0 0 0 0 0 0 0 0 153968 327343
343776 45 16 39
54 0 0 46062984 39259936 0 7 0 0 0 0 0 0 0 0 0 153721 315839
344732 45 16 39
42 0 0 46099704 39252920 0 15 0 0 0 0 0 0 0 0 0 154629 323125
348798 45 16 39
54 0 0 46068944 39230808 0 8 0 0 0 0 0 0 0 0 0 157166 340265
354135 46 17 37

But the real winner shows up in lockstat where it seems to indicate that
stress on Waiting from ProcArrayLock is relieved (thought shifting
somewhere else which is how lock works):

# ./84_lwlock.d 8042

Lock Id Mode State Count
WALWriteLock Exclusive Acquired 1
XidGenLock Exclusive Waiting 3
CLogControlLock Shared Waiting 11
ProcArrayLock Shared Waiting 39
CLogControlLock Exclusive Waiting 52
WALInsertLock Exclusive Waiting 73
CLogControlLock Shared Acquired 91
ProcArrayLock Exclusive Acquired 96
XidGenLock Exclusive Acquired 96
ProcArrayLock Exclusive Waiting 121
CLogControlLock Exclusive Acquired 199
WALInsertLock Exclusive Acquired 310
FirstBufMappingLock Shared Acquired 408
FirstLockMgrLock Shared Acquired 618
ProcArrayLock Shared Acquired 746
SInvalReadLock Shared Acquired 1542

Lock Id Mode State Combined Time (ns)
WALInsertLock Acquired 118673
CLogControlLock Acquired 172130
FirstBufMappingLock Acquired 177196
WALWriteLock Exclusive Acquired 208403
XidGenLock Exclusive Waiting 325989
FirstLockMgrLock Acquired 2667351
ProcArrayLock Acquired 8179335
XidGenLock Exclusive Acquired 8896177
CLogControlLock Shared Waiting 9680401
CLogControlLock Exclusive Waiting 19105179
CLogControlLock Exclusive Acquired 27484249
SInvalReadLock Acquired 43026960
FirstBufMappingLock Exclusive Acquired 45232906
ProcArrayLock Shared Waiting 46741660
WALInsertLock Exclusive Waiting 50912148
FirstLockMgrLock Exclusive Acquired 58789829
WALInsertLock Exclusive Acquired 86653791
ProcArrayLock Exclusive Waiting 213980787
ProcArrayLock Exclusive Acquired 270028367
SInvalReadLock Exclusive Acquired 303044735

SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr
sys wt idl sze
0 1 0 147238 159453 8806 370676 89236 71258 98435 0 380008
47 17 0 35 64
0 6 0 132463 143446 7975 331685 80847 64746 86578 0 329315
44 16 0 41 64
0 16 0 146655 158621 8987 366866 90756 69953 93786 0 349346
49 17 0 34 64
0 18 0 151326 163492 8992 377634 92860 72406 98968 4 365121
49 17 0 33 64
0 2 0 142914 154169 8243 352104 81385 69598 91260 0 340887
42 16 0 42 64
0 16 0 156755 168962 9080 386475 93072 74775 101465 0 379250
47 18 0 36 64
0 1 0 152807 165134 8880 379521 90671 75073 99692 0 380412
48 18 0 35 64
0 1 0 134778 146041 8122 339137 79888 66633 89220 0 342600
43 16 0 41 64
0 16 0 153014 164789 8834 376117 93000 72743 97644 0 371792
48 18 0 35 64

Not sure what SInvalReadLock does.. need to read up on that..

-Jignesh

>
> 1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011
> 1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013
> 1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012
> 1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013
>
>
> kthr memory page disk faults cpu
> r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs
> us sy id
> 3 0 0 44710008 39603320 0 135 0 0 0 0 0 0 0 0 0 110564 145678
> 211567 33 10 57
> 5 0 0 44663368 39595008 0 6 0 0 0 0 0 0 0 0 0 108891 143083
> 208389 33 10 58
> 3 0 0 44753496 39593824 0 132 0 0 0 0 0 0 0 0 0 109922 126865
> 209869 33 9 57
> 4 0 0 44788368 39588528 0 7 0 0 0 0 0 0 0 0 0 108680 129073
> 208068 33 10 57
> 2 0 0 44767920 39570592 0 147 0 0 0 0 0 0 0 0 0 106671 142403
> 204724 33 10 58
> 4 0 0 44762656 39563256 0 11 0 0 0 0 0 0 0 0 0 106185 130328
> 204551 34 10 57
> 2 0 0 44674584 39560912 0 148 0 0 0 0 0 0 0 0 0 104757 139147
> 201448 32 10 58
> 1 0 0 44619824 39551024 0 9 0 0 0 0 0 0 0 0 0 103653 142125
> 199896 32 10 58
> 2 0 0 44622480 39552432 0 141 0 0 0 0 0 0 0 0 0 101373 134547
> 195553 32 9 58
> 1 0 0 44739936 39552312 0 11 0 0 0 0 0 0 0 0 0 102932 121742
> 198205 33 9 58
>
>
> And lock stats are as follows at about 280 users sampling for a single
> backend process:
> # ./84_lwlock.d 29405
>
> Lock Id Mode State Count
> WALWriteLock Exclusive Acquired 1
> XidGenLock Exclusive Waiting 1
> CLogControlLock Shared Waiting 3
> ProcArrayLock Shared Waiting 7
> CLogControlLock Exclusive Waiting 9
> WALInsertLock Exclusive Waiting 45
> CLogControlLock Shared Acquired 52
> ProcArrayLock Exclusive Waiting 61
> XidGenLock Exclusive Acquired 96
> ProcArrayLock Exclusive Acquired 97
> CLogControlLock Exclusive Acquired 152
> WALInsertLock Exclusive Acquired 302
> ProcArrayLock Shared Acquired 729
> FirstLockMgrLock Shared Acquired 812
> FirstBufMappingLock Shared Acquired 857
> SInvalReadLock Shared Acquired 1551
>
> Lock Id Mode State Combined Time
> (ns)
> WALInsertLock Acquired
> 89909
> XidGenLock Exclusive Waiting
> 101488
> WALWriteLock Exclusive Acquired
> 140563
> CLogControlLock Shared Waiting
> 354756
> FirstBufMappingLock Acquired
> 471438
> FirstLockMgrLock Acquired
> 2907141
> XidGenLock Exclusive Acquired
> 7450934
> CLogControlLock Exclusive Waiting
> 11094716
> ProcArrayLock Acquired
> 15495229
> WALInsertLock Exclusive Waiting
> 20801169
> CLogControlLock Exclusive Acquired
> 21339264
> SInvalReadLock Acquired
> 24309991
> FirstLockMgrLock Exclusive Acquired
> 39904071
> FirstBufMappingLock Exclusive Acquired
> 40826435
> ProcArrayLock Shared Waiting
> 86352947
> WALInsertLock Exclusive Acquired
> 89336432
> SInvalReadLock Exclusive Acquired
> 252574515
> ProcArrayLock Exclusive Acquired
> 315064347
> ProcArrayLock Exclusive Waiting
> 847806215
>
> mpstat outputs is too much so I am doing aggegation by procesor set
> which is all 64 cpus
>
> -bash-3.2$ mpstat -a 10
>
> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl
> usr sys wt idl sze
> 0 370 0 118649 127575 7595 244456 43931 62166 8700 0 158929
> 38 11 0 50 64
> 0 167 0 119668 128704 7644 246389 43287 62357 8816 0 161006
> 38 11 0 51 64
> 0 27 0 109461 117433 6997 224514 38562 56446 8171 0 148322
> 34 10 0 56 64
> 0 2 0 122368 131549 7871 250237 39620 61478 9082 0 165995
> 36 11 0 52 64
> 0 0 0 122025 131380 7973 249429 37292 59863 8922 0 166319
> 35 11 0 54 64
>
> (quick overview of columns )
> SET Processor set
> minf minor faults
> mjf major faults
> xcal inter-processor cross-calls
> intr interrupts
> ithr interrupts as threads (not counting clock
> interrupt)
> csw context switches
> icsw involuntary context switches
> migr thread migrations (to another processor)
> smtx spins on mutexes (lock not acquired on first
> try)
> srw spins on readers/writer locks (lock not
> acquired on first try)
> syscl system calls
> usr percent user time
> sys percent system time
> wt the I/O wait time is no longer calculated as a
> percentage of CPU time, and this statistic
> will always return zero.
> idl percent idle time
> sze number of processors in the requested proces-
> sor set
>
>
> -Jignesh
>
>

--
Jignesh Shah http://blogs.sun.com/jkshah
The New Sun Microsystems,Inc http://sun.com/postgresql

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-03-13 16:54:01 Re: Proposal of tunable fix for scalability of 8.4
Previous Message Kevin Grittner 2009-03-13 16:18:19 Re: Proposal of tunable fix for scalability of 8.4