Hello,
We just purchased two new 4-way (one active one failover) 5Ghz Power6 Servers (failover) with 64GB RAM (32GB per node) runing AIX 6.1 with two LPARs per node connected to our SAN with two 4GB HBAs. The PROD LPAR has 2 dedicated CPUs (4 virtual) and the TEST LPAR has 2 dedicated CPUs.
When we started parallel testing to move our production application to this server, I noticed that it didn't seem to be performing as fast as I thought it should compared to our existing server.
Our exisiting server is an 8-way, 1.6Ghz Power5 with 32GB RAM (16GB per node) connected to our SAN with two 2GB HBAs. We have 5 physical CPUs dedicated to the PROD LPAR adn two dedicated to the TEST LPAR.
I started by running the common performance monitoring tools during our parallel testing, like VMSTAT, MPSTAT, etc. For some reason, the System/OS is using about twice the CPU as the User Processes. Everything I've ever seen or been told about UNIX Administration says that the System should not use more CPU than the User Processes. If it does, the OS needs to be better tuned for the application its running or there is some kind of bottleneck somewhere (CPU, I/O, Network).
So, the vendor (Not IBM) that installed the servers for us has not been able to explain or correct this after numerous changes to the filesystem, kernel settings, I/O buffers, etc.
VMSTAT does not show any obvious bottlenecks other than the OS seems to be using way too much CPU compared to the User Processes. r & b are less than the number of CPUs for the most part. wt is very low. pi/po are zero.
Here is a sample of the VMSTAT output during a test which represented about 20% of our production transaction volume going through the new server.
/>vmstat -w 5
System configuration: lcpu=8 mem=24576MB
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 0 3340754 1940960 0 0 0 0 0 0 65 21903 1207 2 3 95 0
2 0 3340770 1940918 0 0 0 0 0 0 260 43677 1654 3 6 90 1
2 0 3340885 1940771 0 0 0 0 0 0 125 37038 1601 3 8 89 0
1 0 3340742 1940897 0 0 0 0 0 0 75 24788 1290 2 5 93 0
1 0 3340699 1940913 0 0 0 0 0 0 99 38021 1375 2 6 92 0
1 0 3340685 1940898 0 0 0 0 0 0 97 34672 1424 2 5 93 0
1 0 3340673 1940881 0 0 0 0 0 0 137 23928 1640 3 8 89 0
1 0 3340634 1940881 0 0 0 0 0 0 135 39418 1615 3 6 91 0
1 0 3341393 1940054 0 0 0 0 0 0 166 26856 1749 4 7 88 0
1 0 3341378 1940035 0 0 0 0 0 0 106 35104 1301 2 5 93 0
1 0 3341381 1940008 0 0 0 0 0 0 73 36011 1171 2 3 95 0
1 0 3341407 1939948 0 0 0 0 0 0 101 23827 1330 2 5 93 0
1 0 3341377 1939933 0 0 0 0 0 0 143 33983 1638 3 7 90 0
0 0 3341394 1939876 0 0 0 0 0 0 249 38386 1634 3 6 90 0
As we put more load on the machine, I thought that this might even out, but it didn't. Below is a VMSTAT from a test that represented about 200% of our production volume being processed by the new server.
System configuration: lcpu=8 mem=24576MB
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 1 2323028 3038088 0 0 0 0 0 0 731 53814 7149 18 45 37 1
3 0 2324120 3036759 0 0 0 0 0 0 825 54887 7107 20 43 36 1
3 0 2324346 3036422 0 0 0 0 0 0 758 45717 5610 16 41 42 2
2 1 2324357 3036295 0 0 0 0 0 0 932 52869 7709 17 46 36 1
2 0 2324395 3036165 0 0 0 0 0 0 774 46603 5759 16 42 42 1
2 0 2323100 3037244 0 0 0 0 0 0 893 52706 7509 17 45 37 2
4 0 2324297 3035931 0 0 0 0 0 0 737 45806 5381 15 38 46 1
3 0 2324751 3035377 0 0 0 0 0 0 773 53345 7091 18 46 35 1
3 0 2324801 3035185 0 0 0 0 0 0 773 52399 7071 17 43 39 1
2 0 2325211 3034652 0 0 0 0 0 0 615 46806 5469 17 41 42 1
2 1 2325890 3033848 0 0 0 0 0 0 757 50556 6565 21 43 35 1
2 0 2324992 3034627 0 0 0 0 0 0 712 51243 7530 13 41 45 1
3 0 2325939 3033444 0 0 0 0 0 0 655 46586 5832 17 39 42 1
3 1 2325297 3033969 0 0 8 0 0 0 659 52255 6002 19 42 38 1
3 0 2325296 3033879 0 0 0 0 0 0 705 51447 6256 18 45 36 1
4 0 2326345 3032446 0 0 0 0 0 0 566 58858 9930 13 43 44 1
4 0 2326502 3032220 0 0 0 0 0 0 371 39132 3743 10 37 53 0
3 1 2329518 3029111 0 0 0 0 0 0 595 55473 6341 22 45 33 1
Is this normal? Am I just wrong about what normal CPU utilization should be in an AIX LPAR environment?
Thanks so much!
Troy