I have a system running AIX 61 shared uncapped partition (with 11 physical processors, 24 Virtual 72GB of Memory) .
The output from NMON, vmstat show a high run queue (60+) for continous periods of time intervals, but NO paging, relatively low I/o (6000) , CPU % is 40, Low network.
The only indicators are the high number of syscalls being made. Any thoughts would be appreciated
What applications running there? Is SMT activated? What OS level do you have (latest fix pack installed?)? Is that always, only at peak times? What has changed since? More info would be helpful, thanks.
SMT is on running on a P720 (ie SMT=4), AIX TL6-SP3-1048 (6100-06-03-1048)
A DATA Base server (Universe DB). No changes on system , except volume, ie more user activity
That is the main issue: I do not have any more data to offer except observations in regards the high syscall count, I was hoping to understand how I can I get to see the syscals being made; I ran a trace---> curt and and the majority of calls are made to "unknown" routines.
can you post output of vmstat -v 2 10 taken during a peak time when your run queue is high, iostat -Dl, vmstat -v, vmstat -s outputs please. High runqueues with low cpu utilization in most cases point to IO- or memory problems
Regards
zxmaus
60% (on average) is deemed high for DB work? I was thinking that the "sy" (20%) was high; that is why I was thinking that my syscalls were the bottlekneck;
No looping process seen or noticed. All tasks ended whether normally ot via a time out.
just a busy system - nothing wrong with it (though your IO needs by the looks of it more buffer memory - check vmstat -v outputs and tune accordingly).
if you have 24 virtual cpus with smt4, than up to 96 processes in the runqueue are running directly on the cpu - so anything below 96 is no reason for concern in any way
Runqueue processes are processes either running on or waiting for a cpu - they are only reason to look deeper into when you are exceeding your thread count ...
Yes pbufs increase already in the pipeline. Awaiting approval of Change record
Interesting thought ("so anything below 96 is no reason for concern in any way") which I am still considering.
The runqueue as in the vmstat (on the R side) indicate number of tasks that could not be dispatched within the time cycle (1/10 of a second) ; the B (blocked )indicates tasks that could not be dispatched becasue they are not ready example waiting for I/O, My blocked count on vmstat is 0
So I am not quite sure I agree with the "no concern";
Perhaps I should ask, Do you consider this systems to be overutilized?
Thank you and I hope you do not mind my questions...
regards
Il-Malti
No I actually dont consider your system overutilized. And the definition of the r + b row:
Regarding your high amount of syscalls - yeah for sure worth looking into why / what is generating them. You might want to start with your network traffic on the local host - take a look at whether your applications are making "single row requests" to your database rather than fetching an array of results per request to process?