I've just been handed a hot potato from a colleague who left :(... our client has been complaining about slow performance on one of our servers.
I'm not very experienced in investigating performance issues so I hoping someone will be so kind to provide some guidance
Here is an overview of the system:
-running Solaris 10 SPARC, multiple Sybase instances & apps (java, perl, financial software).
-kernel version: Generic_142900-13
$ uptime
1:23pm up 13 day(s), 17:34, 19 users, load average: 21.75, 22.65, 25.14
Huge amount of memory & CPUs:
# prtdiag -v
System Configuration: Sun Microsystems sun4u Sun Fire E25K
System clock frequency: 150 MHz
Memory size: 163840 Megabytes
========================= CPUs =========================
CPU Run E$ CPU CPU
Slot ID ID MHz MB Impl. Mask
-------- ------- ---- ---- ------- ----
/SB00/P0 0, 4 1800 32.0 US-IV+ 2.2
/SB00/P1 1, 5 1800 32.0 US-IV+ 2.2
/SB00/P2 2, 6 1800 32.0 US-IV+ 2.2
/SB00/P3 3, 7 1800 32.0 US-IV+ 2.2
/SB01/P0 32, 36 1350 16.0 US-IV 3.1
/SB01/P1 33, 37 1350 16.0 US-IV 3.1
/SB01/P2 34, 38 1350 16.0 US-IV 3.1
/SB01/P3 35, 39 1350 16.0 US-IV 3.1
/SB04/P0 128,132 1800 32.0 US-IV+ 2.2
/SB04/P1 129,133 1800 32.0 US-IV+ 2.2
/SB04/P2 130,134 1800 32.0 US-IV+ 2.2
/SB04/P3 131,135 1800 32.0 US-IV+ 2.2
/SB05/P0 160,164 1800 32.0 US-IV+ 2.2
/SB05/P1 161,165 1800 32.0 US-IV+ 2.2
/SB05/P2 162,166 1800 32.0 US-IV+ 2.2
/SB05/P3 163,167 1800 32.0 US-IV+ 2.2
/SB08/P0 256,260 1350 16.0 US-IV 3.1
/SB08/P1 257,261 1350 16.0 US-IV 3.1
/SB08/P2 258,262 1350 16.0 US-IV 3.1
/SB08/P3 259,263 1350 16.0 US-IV 3.1
But even with all that CPU power, the system still seems to be choking:
# sar -q
SunOS aubbwsyd01 5.10 Generic_142900-13 sun4u 03/24/2011
00:00:01 runq-sz %runocc swpq-sz %swpocc
00:05:02 26.4 72 0.0 0
00:10:02 25.9 71 0.0 0
00:15:02 27.4 73 0.0 0
00:20:01 27.3 62 0.0 0
00:25:01 25.5 66 0.0 0
00:30:02 26.9 75 0.0 0
00:35:01 36.1 60 0.0 0
00:40:02 28.5 64 0.0 0
00:45:01 30.6 58 0.0 0
00:50:02 30.0 64 0.0 0
00:55:02 30.4 59 0.0 0
01:00:02 26.7 64 0.0 0
...
12:45:02 29.5 78 0.0 0
12:50:01 27.4 90 0.0 0
12:55:01 29.7 79 0.0 0
13:00:03 30.7 76 0.0 0
13:05:01 30.4 86 0.0 0
13:10:03 34.6 81 0.0 0
13:15:01 26.8 84 0.0 0
13:20:02 30.4 77 0.0 0
13:25:01 31.6 72 0.0 0
Average 29.5 69 0.0 0
# sar -r
SunOS aubbwsyd01 5.10 Generic_142900-13 sun4u 03/24/2011
00:00:01 freemem freeswap
00:05:02 586184 110438515
00:10:02 562080 113580170
00:15:02 547328 111934356
00:20:01 577790 111795786
00:25:01 597018 112950564
00:30:02 630584 110620673
00:35:01 649792 113179258
00:40:02 662950 110557264
00:45:01 658017 113512159
00:50:02 633167 110902038
00:55:02 644952 113924963
01:00:02 610516 112041306
...
12:45:02 348721 97869521
12:50:01 340880 96804395
12:55:01 339169 98490899
13:00:03 327440 99308450
13:05:01 336337 97280372
13:10:03 341150 99300626
13:15:01 345920 98246498
13:20:02 369102 99563900
13:25:01 387421 99101277
Average 627886 118480917
#mpstat 5 2
... (2nd iteration below)
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 2152 1 26484 926 336 1593 276 649 976 5 14654 34 44 0 22
1 2056 1 32114 796 285 1322 254 597 958 9 16580 38 43 0 20
2 1715 1 25972 888 323 1578 262 602 822 3 22862 33 46 0 21
3 1706 2 29307 724 279 1183 197 515 820 6 19937 40 39 0 21
4 1378 0 25992 816 313 1464 211 564 779 1 16577 43 35 0 22
5 1587 1 28487 808 302 1420 237 571 930 5 20051 31 48 0 21
6 1429 1 19215 765 286 1338 207 521 830 3 21779 38 39 0 24
7 1547 0 22940 801 293 1497 234 557 820 2 19536 35 44 0 22
32 1217 2 15876 1314 641 1125 287 555 574 3 5699 31 57 0 12
33 1304 3 23066 870 303 1469 307 664 603 3 7398 38 47 0 15
34 1459 1 25564 951 337 1565 330 691 660 3 8834 32 51 0 16
35 1282 2 22116 898 340 1565 280 633 585 3 7867 36 47 0 17
36 1255 1 20946 802 286 1296 285 583 567 3 9369 30 61 0 9
37 1348 0 23823 813 297 1426 260 581 601 3 7670 32 51 0 17
38 1028 1 21024 810 296 1434 258 588 551 4 6874 32 51 0 17
39 1065 1 21564 706 270 1321 192 512 771 1 7690 36 47 0 17
128 1517 1 25091 1059 375 1535 371 733 860 2 27353 41 44 0 16
129 1707 1 27668 927 334 1448 308 673 823 2 20142 39 44 0 17
130 1376 2 23294 866 318 1349 282 624 745 3 26822 37 46 0 17
131 1238 4 20804 895 322 1425 325 610 744 3 32165 46 39 0 15
132 1169 1 24721 780 283 1264 262 535 798 3 31841 47 39 0 14
133 1339 0 20148 789 289 1202 256 537 928 1 30757 46 41 0 13
134 1134 2 21571 862 315 1372 279 587 812 2 32827 46 38 0 16
135 1296 2 19052 898 331 1437 293 601 680 2 28036 43 39 0 18
160 1151 0 20643 730 241 1027 292 470 1065 3 57836 57 36 0 8
161 1094 0 13299 848 297 1188 323 473 1050 3 58257 45 46 0 10
162 1245 0 15682 923 330 1221 370 477 778 3 53849 49 42 0 9
163 927 0 9607 845 297 1145 370 423 678 2 69122 55 39 0 6
164 560 0 14091 4496 4033 1016 276 380 515 2 50642 50 42 0 9
165 675 0 18376 1595 1135 1002 259 377 662 2 62744 52 36 0 12
166 593 0 9206 901 331 1215 375 421 529 2 81789 59 33 0 8
167 838 0 24495 733 267 958 279 361 566 2 54789 53 35 0 12
256 1409 4 20748 878 309 1192 282 560 546 3 17693 36 49 0 16
257 1363 4 19532 848 298 1201 305 522 566 3 24880 39 48 0 13
258 1252 2 27165 865 322 1192 267 507 644 5 27032 32 52 0 15
259 1089 0 18189 902 379 1211 252 480 490 2 26119 36 47 0 17
260 1249 4 19819 1018 397 1508 303 570 468 3 28197 34 45 0 21
261 1081 6 18595 807 326 985 241 447 490 2 29507 34 51 0 14
262 1065 3 16197 882 351 1290 251 478 471 2 32525 33 48 0 19
263 1095 2 21474 1308 791 1218 237 477 562 3 26501 32 49 0 19
#top
last pid: 13141; load avg: 24.9, 25.3, 25.3; up 13+17:46:55 13:36:00
1399 processes: 1382 sleeping, 1 running, 1 zombie, 15 on cpu
CPU states: 56.7% idle, 24.5% user, 18.8% kernel, 0.0% iowait, 0.0% swap
Memory: 160G phys mem, 3221M free mem, 281G swap, 276G free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
3035 sybdev 176 0 0 12G 12G cpu 115.2H 122% dataserver
15934 sybase 264 0 0 12G 12G cpu 143.8H 108% dataserver
15440 sybase 264 0 0 12G 12G cpu 170.9H 98.04% dataserver
5436 sybdev 158 0 0 12G 12G cpu 195.5H 97.95% dataserver
15932 sybase 264 0 0 12G 12G cpu 50.0H 97.94% dataserver
2860 sybdev 264 0 0 12G 12G cpu 186.6H 88.24% dataserver
15955 sybase 264 0 0 12G 12G cpu 26.6H 79.29% dataserver
15966 sybase 264 4 0 12G 12G sleep 34.4H 59.64% dataserver
2902 sybdev 264 0 0 12G 12G cpu 101.1H 59.48% dataserver
15937 sybase 264 0 0 12G 12G cpu 140.2H 41.35% dataserver
19421 appdev 1 0 0 443M 411M sleep 836:14 33.02% perl
12074 appdev 999 59 0 3002M 2817M sleep 33.3H 31.77% java
24636 appdev 999 59 0 485M 432M sleep 18:12 31.40% java
27539 appdev 1 0 0 1843M 1655M cpu 46.6H 29.13% perl
10297 appdev 1 0 2 39M 19M cpu 104:16 28.15% perl
So I just can't figure out where these huge runq's are coming from... can someone please tell me what I'm missing or what would be the next thing to check?
Maybe it's staring me right in the face but I just don't see it :wall:
Many thanks in advance!!