High Runqueue (R) LOW CPU LOW I/O Low Network Low memory usage

Hello All

I have a system running AIX 61 shared uncapped partition (with 11 physical processors, 24 Virtual 72GB of Memory) .

The output from NMON, vmstat show a high run queue (60+) for continous periods of time intervals, but NO paging, relatively low I/o (6000) , CPU % is 40, Low network.

The only indicators are the high number of syscalls being made. Any thoughts would be appreciated

What applications running there? Is SMT activated? What OS level do you have (latest fix pack installed?)? Is that always, only at peak times? What has changed since? More info would be helpful, thanks.

SMT is on running on a P720 (ie SMT=4), AIX TL6-SP3-1048 (6100-06-03-1048)
A DATA Base server (Universe DB). No changes on system , except volume, ie more user activity

That is the main issue: I do not have any more data to offer except observations in regards the high syscall count, I was hoping to understand how I can I get to see the syscals being made; I ran a trace---> curt and and the majority of calls are made to "unknown" routines.

can you post output of vmstat -v 2 10 taken during a peak time when your run queue is high, iostat -Dl, vmstat -v, vmstat -s outputs please. High runqueues with low cpu utilization in most cases point to IO- or memory problems
Regards
zxmaus

Here is some data; Am I right in reading: plenty of memory (no paging and tons in the fre), idle time is relative plentiful; and no io wait.....

----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
12 1 3569398 5794339 0 0 0 0 0 0 4650 1770166 10110 62 20 18 0 11.32 103.0
10 1 3568772 5794931 0 0 0 0 0 0 4367 1500395 9389 51 21 28 0 10.18 92.5
18 0 3570140 5793348 0 0 0 0 0 0 4579 1797185 10839 64 21 15 0 11.46 104.2
27 0 3570626 5792608 0 0 0 0 0 0 4885 1927251 11404 63 21 16 0 11.45 104.1
32 0 3571451 5791641 0 0 0 0 0 0 4458 1842323 10604 62 21 16 0 11.33 103.0
16 2 3572631 5790341 0 0 0 0 0 0 3937 1536208 9027 58 20 22 0 10.88 98.9
22 0 3571930 5790930 0 0 0 0 0 0 4787 1489063 10280 54 21 25 0 10.75 97.7
19 1 3572992 5789819 0 0 0 0 0 0 4560 1575266 10585 57 21 22 0 10.86 98.7
0 0 3575629 5787020 0 0 0 0 0 0 4551 1609988 10993 60 20 20 0 10.84 98.6
21 0 3578575 5783902 0 0 0 0 0 0 4735 1940362 11115 66 19 15 0 11.49 104.4
23 0 3579044 5783308 0 0 0 0 0 0 3812 1734049 9016 67 17 16 0 11.57 105.1
29 1 3579558 5782506 0 0 0 0 0 0 4539 1605220 10287 61 20 19 0 11.20 101.8

The figures in the User CPU "us" column are high for a database system. Have you checked for looping orphan prococeses ?

60% (on average) is deemed high for DB work? I was thinking that the "sy" (20%) was high; that is why I was thinking that my syscalls were the bottlekneck;

No looping process seen or noticed. All tasks ended whether normally ot via a time out.

Thank you all.

just a busy system - nothing wrong with it (though your IO needs by the looks of it more buffer memory - check vmstat -v outputs and tune accordingly).

if you have 24 virtual cpus with smt4, than up to 96 processes in the runqueue are running directly on the cpu - so anything below 96 is no reason for concern in any way

Runqueue processes are processes either running on or waiting for a cpu - they are only reason to look deeper into when you are exceeding your thread count ...

Regards
zxmaus

Yes pbufs increase already in the pipeline. Awaiting approval of Change record

Interesting thought ("so anything below 96 is no reason for concern in any way") which I am still considering.

The runqueue as in the vmstat (on the R side) indicate number of tasks that could not be dispatched within the time cycle (1/10 of a second) ; the B (blocked )indicates tasks that could not be dispatched becasue they are not ready example waiting for I/O, My blocked count on vmstat is 0

So I am not quite sure I agree with the "no concern";

Perhaps I should ask, Do you consider this systems to be overutilized?

Thank you and I hope you do not mind my questions...
regards
Il-Malti

No I actually dont consider your system overutilized. And the definition of the r + b row:

Regarding your high amount of syscalls - yeah for sure worth looking into why / what is generating them. You might want to start with your network traffic on the local host - take a look at whether your applications are making "single row requests" to your database rather than fetching an array of results per request to process?

Regards
zxmaus