Re: User concurrency thresholding: where do I look?

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: User concurrency thresholding: where do I look?
Date: 2007-07-20 21:24:33
Message-ID: 46A12811.4020205@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

True you cant switch off the locks since libthread has been folded into
libc in Solaris 10.

Anyway just to give you an idea of the increase in context switching at
the break point here are the mpstat (taken at 10 second interval) on
this 8-socket Sun Fire V890.

The low icsw (Involuntary Context Switches) is about 950-1000 user mark
after which a context switch storm starts at users above 1000-1050 mark
and drops in total throughput drops about 30% instantaneously.. I will
try rebuilding the postgresql with dtrace probes to get more clues.
(NOTE you will see 1 cpu (cpuid:22) doing more system work... thats the
one doing handling the network interrupts)

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 57 0 27 108 6 4072 98 1749 416 1 7763 47 13 0 40
1 46 0 24 22 6 4198 11 1826 427 0 7547 45 13 0 42
2 42 0 34 104 8 4103 91 1682 424 1 7797 46 13 0 41
3 51 0 22 21 6 4125 10 1734 435 0 7399 45 13 0 43
4 65 0 27 19 6 4015 8 1706 411 0 7292 44 15 0 41
5 54 0 21 21 6 4297 10 1702 464 0 7708 45 13 0 42
6 36 0 16 66 47 4218 12 1713 426 0 7685 47 11 0 42
7 40 0 100 318 206 3699 10 1534 585 0 6851 45 14 0 41
16 41 0 30 87 5 3780 78 1509 401 1 7604 45 13 0 42
17 39 0 24 22 5 3970 12 1631 408 0 7265 44 12 0 44
18 42 0 24 99 5 3829 89 1519 401 1 7343 45 12 0 43
19 39 0 31 78830 5 3588 8 1509 400 0 6629 43 13 0 44
20 22 0 20 19 6 3925 9 1577 419 0 7364 44 12 0 44
21 38 0 31 23 5 3792 13 1566 407 0 7133 45 12 0 44
22 8 0 110 7053 7045 1641 8 728 838 0 2917 16 50 0 33
23 62 0 29 21 5 3985 10 1579 449 0 7368 44 12 0 44
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 13 0 27 123 6 4228 113 1820 433 1 8084 49 13 0 38
1 16 0 63 26 6 4253 15 1875 420 0 7754 47 14 0 39
2 11 0 31 110 8 4178 97 1741 425 1 8095 48 14 0 38
3 8 0 24 20 6 4257 9 1818 444 0 7807 47 13 0 40
4 13 0 54 28 6 4145 17 1774 426 1 7732 46 16 0 38
5 12 0 35 23 6 4412 12 1775 447 0 8249 48 13 0 39
6 8 0 24 38 15 4323 14 1760 422 0 8016 49 11 0 39
7 8 0 120 323 206 3801 15 1599 635 0 7290 47 15 0 38
16 11 0 44 107 5 3896 98 1582 393 1 7997 47 15 0 39
17 15 0 29 24 5 4120 14 1716 416 0 7648 46 13 0 41
18 9 0 35 113 5 3933 103 1594 399 1 7714 47 13 0 40
19 8 0 34 83271 5 3702 12 1564 403 0 7010 45 14 0 41
20 7 0 28 27 6 3997 16 1624 400 0 7676 46 13 0 41
21 8 0 28 25 5 3997 15 1664 402 0 7658 47 12 0 41
22 4 0 97 7741 7731 1586 11 704 906 0 2933 17 51 0 32
23 13 0 28 25 5 4144 15 1658 437 0 7810 47 12 0 41
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 141 315 6 9262 301 2812 330 0 10905 49 16 0 35
1 1 0 153 199 6 9400 186 2808 312 0 11066 48 16 0 37
2 0 0 140 256 8 8798 242 2592 310 0 10111 47 15 0 38
3 1 0 141 189 6 8803 172 2592 314 0 10171 47 15 0 39
4 0 0 120 214 6 9540 207 2801 322 0 10531 46 17 0 36
5 1 0 152 180 6 8764 161 2564 342 0 9904 47 15 0 38
6 1 0 107 344 148 8180 181 2512 290 0 9314 51 14 0 35
7 0 0 665 443 204 8733 153 2574 404 0 9892 43 21 0 37
16 0 0 113 217 5 6446 201 1975 265 0 7552 45 12 0 44
17 0 0 107 153 5 6568 140 2021 274 0 7586 44 11 0 45
18 0 0 121 215 5 6072 201 1789 276 1 7690 44 12 0 44
19 1 0 102 47142 5 6123 126 1829 262 0 7185 43 12 0 45
20 0 0 102 143 6 6451 129 1939 262 0 7450 43 13 0 44
21 1 0 106 150 5 6538 133 1997 285 0 7425 44 11 0 44
22 0 0 494 5949 5876 3586 73 1040 399 0 4058 26 39 0 34
23 0 0 102 159 5 6393 142 1942 324 0 7226 43 12 0 46
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 217 441 7 10763 426 3234 363 0 12449 47 18
0 35
1 0 0 210 322 7 11113 309 3273 351 0 12527 46 17
0 37
2 1 0 212 387 8 10306 370 2977 354 0 11320 45 16
0 38
3 0 0 230 276 7 10332 257 2947 341 0 11901 43 16
0 40
4 0 0 234 306 7 11324 290 3265 352 0 12805 45 18
0 37
5 0 0 212 284 7 10590 262 3042 388 0 11789 44 17
0 39
6 1 0 154 307 48 9583 241 2903 324 0 10564 50 15 0 35
7 0 0 840 535 206 10354 247 3035 428 0 11700 42 22
0 37
16 0 0 169 303 5 7446 286 2250 290 0 8361 42 13 0 45
17 0 0 173 240 5 7640 225 2288 295 0 8674 41 13 0 47
18 0 0 170 289 5 7445 270 2108 286 0 8167 41 12 0 47
19 0 0 176 51118 5 7365 197 2138 288 0 7934 40 13 0 47
20 1 0 172 222 6 7835 204 2323 298 0 8759 40 14 0 46
21 0 0 167 233 5 7749 218 2339 326 0 8264 42 13 0 46
22 0 0 749 6612 6516 4173 97 1166 421 0 4741 23 44 0 33
23 0 0 181 239 6 7709 219 2258 383 0 8402 41 12 0 47
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 198 439 6 10364 417 3113 327 0 11962 49 17
0 34
1 0 0 210 299 6 10655 282 3135 346 0 12463 47 17
0 36
2 0 0 202 352 8 9960 332 2890 320 0 11261 47 16 0 37
3 0 0 182 276 6 9950 255 2857 334 0 11021 46 16 0 38
4 0 0 200 305 6 10841 286 3127 325 0 12440 48 18
0 35
5 0 0 240 286 6 9983 272 2912 358 0 11450 46 16 0 37
6 0 0 153 323 81 9062 233 2767 300 0 9675 49 18 0 33
7 0 0 850 556 206 10027 271 2910 415 0 11048 43 22
0 35
16 0 0 152 306 5 7261 291 2216 266 0 8055 44 12 0 44
17 0 0 151 236 5 7193 217 2170 283 0 8099 43 12 0 45
18 0 0 170 263 5 7008 246 2009 254 0 7836 43 12 0 46
19 0 0 165 47738 5 6824 197 1989 273 0 7663 42 12 0 46
20 0 0 188 217 6 7496 197 2222 280 0 8435 43 13 0 44
21 0 0 179 248 5 7352 234 2233 309 0 8237 43 12 0 44
22 0 0 813 6041 5963 4006 82 1125 448 0 4442 25 42 0 33
23 0 0 162 241 5 7364 225 2170 355 0 7720 43 11 0 45

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> What its saying is that there are holds/waits in trying to get locks
>> which are locked at Solaris user library levels called from the
>> postgresql functions:
>> For example both the following functions are hitting on the same mutex
>> lock 0x10059e280 in Solaris Library call:
>> postgres`AllocSetDelete+0x98
>> postgres`AllocSetAlloc+0x1c4
>>
>
> That's a perfect example of the sort of useless overhead that I was
> complaining of just now in pgsql-patches. Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded. And yet, apparently you can't turn
> that off in Solaris.
>
> (Fortunately, the palloc layer is probably insulating us from malloc's
> performance enough that this isn't a huge deal. But it's annoying.)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2007-07-20 23:26:07 Re: Postgres configuration for 64 CPUs, 128 GB RAM...
Previous Message Tom Lane 2007-07-20 20:57:34 Re: User concurrency thresholding: where do I look?