Excessive Kernel CPU Usage

Good Morning All, I've been a long time lurker but this if my first time posting.

About 6 months ago I started a new job with an AIX box. I had administered many Debian and Red Hat variant systems before, but this was my first AIX. It is an old box (Power4) that runs our ERP.

It had been running along reasonably well for years and we recently started live- querying it from our web site for e-commerce.

That ran fine for the first week, then the past week the server has just been hammered. The stores are only open friday through sunday and it runs fine outside of that. Once we open wait times for the stores and the web site queries goes through the roof.

We are running AIX 6.1 on it and our ERP is Pronto Xi.

The user level CPU usually stays under 90% but the Kernel usage is up into 70%. I have a feeling there is some config set that is increasing our overhead and causing these delays.

Thank you for the help!

---------- Post updated at 11:01 AM ---------- Previous update was at 10:48 AM ----------

Here is a topas shot of it this morning. It is at acceptable levels of usage, but still shows the Kernel usage crazy high.

Topas Monitor for host:    weopronto            EVENTS/QUEUES    FILE/TTY
Mon Nov 18 10:00:24 2013   Interval:  2         Cswitch     270  Readch    60.3M
                                                Syscall  495.0K  Writech   41711
CPU  User%  Kern%  Wait%  Idle%                 Reads    269.5K  Rawin        17
2     45.0   55.0    0.0    0.0                 Writes      135  Ttyout      480
1     10.5   74.5    0.5   14.5                 Forks         9  Igets         1
3      3.5    5.0    2.5   89.0                 Execs         9  Namei       319
0      2.5    7.0    1.5   89.0                 Runqueue    2.0  Dirblk    11662
                                                Waitqueue   0.0
Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out                   MEMORY
Total     8.6     25.5    19.5     1.4     7.3  PAGING           Real,MB    8192
                                                Faults     1367  % Comp     23
Disk    Busy%     KBPS     TPS KB-Read KB-Writ  Steals        0  % Noncomp  76
Total     6.0     76.0    18.0     8.0    68.0  PgspIn        0  % Client    0
                                                PgspOut       0
FileSystem        KBPS     TPS KB-Read KB-Writ  PageIn        2  PAGING SPACE
Total             60.3K  269.4K  60.2K  33.6    PageOut       8  Size,MB    3072
                                                Sios         10  % Used      5
Name            PID  CPU%  PgSp Owner                            % Free     95
pronto      1536442  24.9   0.6 rhoward         NFS (calls/sec)
prospl      1425648  20.5   0.4 tbarnidg        SerV2         0  WPAR Activ    0
ksh         1331436   0.6   0.5 tbarnidg        CliV2         0  WPAR Total    0
pronto      1528192   0.4   1.0 bmccullo        SerV3         0  Press: "h"-help
pronto      1696028   0.2   0.7 rjohnson        CliV3         0         "q"-quit
topas        537036   0.2   4.6 root
pronto      1507430   0.2   0.9 mroisum
prospl      1589492   0.2   0.6 rfbjone9
httpd        389362   0.1   0.7 tbarnidg
gil           45088   0.0   0.1 root
llbd         192962   0.0   0.5 root
glbd         176608   0.0   0.8 root
nmbd         201152   0.0   0.5 root
swapper        4376   0.0   0.1 root
tcl          180656   0.0  59.0 root
random       135318   0.0   0.1 root
pronto       266268   0.0   0.7 tvassall
i4llmd       205286   0.0   1.0 root
i4lmd        127414   0.0   1.2 root
sendmail      98450   0.0   1.4 root


Try

tprof -skex sleep 60 

See....Java SDK_1vg0001475cb4a-1190e2e0f74-8000_1006.html

What are the pronto and prospl processes. Also note syscall and cswitch values and monitor over time.

Do you have a history of performance.

Those processes are our ERP. I am aware that they are taking the majority of our CPU usage. Historically it would run about 50-60% usage on a normal weekend.

I've actually been using tprof to monitor it- output of the top users is below.

Configuration information
=========================
System: AIX 6.1 Node: weopronto Machine: 0006830A4C00
Tprof command was: 0\J
    tprof -x sleep 60
Trace command was: 0\x
    /usr/bin/trace -ad -M -L 1331111116 -T 500000 -j 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,234,5D8, -o -      
Total Samples = 24020
Traced Time = 60.05s (out of a total execution time of 60.05s)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Process                                Freq  Total Kernel   User Shared  Other
=======                                ====  ===== ======   ==== ======  =====
/pronto/bin/prospl                       47  90.78  78.82  10.42   1.54   0.00
wait                                      4   2.84   2.84   0.00   0.00   0.00
/pronto/bin/pronto                        3   2.25   2.12   0.10   0.03   0.00
/usr/bin/ksh                             97   2.11   1.29   0.31   0.52   0.00
/bin/sh                                   7   0.34   0.28   0.05   0.01   0.00
/etc/syncd                                1   0.27   0.27   0.00   0.00   0.00
/usr/bin/date                            52   0.22   0.21   0.00   0.01   0.00
/usr/local/bin/awk                       39   0.16   0.16   0.00   0.00   0.00
gil                                       4   0.15   0.15   0.00   0.00   0.00
/usr/bin/curl                             3   0.09   0.06   0.00   0.03   0.00
httpd                                    12   0.09   0.07   0.01   0.01   0.00
/usr/bin/rm                              17   0.07   0.07   0.00   0.00   0.00
/usr/bin/tprof                            1   0.07   0.01   0.00   0.05   0.00
/usr/bin/printenv                        15   0.06   0.06   0.00   0.00   0.00
swapper                                   2   0.05   0.05   0.00   0.00   0.00
/usr/bin/sh                               7   0.05   0.02   0.01   0.03   0.00
/usr/bin/login                            2   0.04   0.03   0.00   0.01   0.00
/bin/ksh                                  1   0.04   0.01   0.01   0.01   0.00
send-mail                                 4   0.04   0.02   0.00   0.01   0.00

This one was this morning when usage was lower and performance was better(not good) and it's still showing the high kernel usage