I am currently having trouble to get nmon to print me the actual CPU usage for an interval for a process.
According to the manual, something like
# time nmon -t -C cron -s 5 -c 2 -F outfile
real 0m0.98s
user 0m0.03s
sys 0m0.04s
should print out at least the process information about cron for an interval of 2 x 5 seconds.
I tried it without specifying which process (without -C) and other parameters, but no chance. I get the very general information about everyhting else in the output, but nothing about any processes.
Also what I do not understand is, why it always runs through in much less time than I specified with -s and -c.
I am currently on AIX 6100-06-05-1115, and I am not root. Though when I call nmon to be in it's online mode and press "t", I get the top view as non-root user.
Any help is welcome. Alternatives to get the current CPU usage for a process over a specified interval is welcome.
I also tried to get the information with pprof but it seems it's showing like ps some values (ACCT_TIME) which are not working for me at all, as this seems to be the usage over time since the process was started, which is not what I am looking for. I also checked tprof , but as it looks it only works for processes that are started with it, not for processes which are already running.
In the IBM DeveloperWorks Wiki I found Nigel Griffiths' entry for a C-program to get the process information (IBM Developer)
He states that you have to take at least 2 measures and calculate the difference (I guess you have to bring this into relation with other processes etc. too, since the values I got did not tell me much).
I am looking for an easier way if any.
NMON does not seem to work properly with the process option "-C" and recording mode "-f". It only shows the TOP processes.
If you specify a recording option "-f", the nmon process goes to background (init) and your command "time nmon -t -C cron -s 5 -c 2 -F outfile" returns immediately
Thanks, but the options I used are valid switches I get displayed when I issue an nmon -h or if I check the man page, as well in the documentation in the IBM Wiki.
@-=Xray=-
Thanks, but tprof is for monitoring a program that you start, like sleep . I have processes that are running already and I have to check them for let's say an interval of 10 seconds and have to get their actual CPU usage for exactly this interval.
I don't want it since the process is alive (that would be regular ps ) nor do I have the option to start a program.
For the explanation of -f together with -c and -s for nmon it seems to be a pity that this option is valid as it looks to me, that the output file is being written and does not get any addition 5 or 10 seconds later.
But I got your point and I think I have tried that already. I will have another look into it, but iirc, the values I got back were not that what I was looking for.
The background of all this is that I am about to write a plugin for Nagios which shall just capture the the actual CPU usage of the specified interval for a process.
Those plugins I have found do usually issue a ps and work with values that are the average since the process came to life.
When this is several days and you currently have a workload peak, you get for example just 12% CPU usage back while it is actually 77%.
So this approach is in my eyes "useless".
Maybe I explained it better now. If you have any other ideas, let me know. Thanks a lot so far for your efforts both.
Where I currently have a doubt is, that I have no process, that is running a long time yet, but having a peak at the moment so that I can see that the values tprof shows are relevant for the interval and not again just some kind of average etc.
When I have a infinite loop running, it will be at a certain level for all of it's life span. So I can't tell if the output is because of the actual CPU usage in the interval or just the average, since it was always at this level.
I checked boxes in my environment, that have some amount of traffic, but sadly the involved processes have a very jumpy behaviour about CPU usage and are so with not suitable for my test to proof the output of tprof .
If you have such processes, that are running for some hours or days etc. already, you could check that for me, if you like
Sure, there you go (nmon comes these days as part of AIX - same as topaz or replaced it):
notroot@somehost:/home/notroot# nmon -v
nmon version TOPAS-NMON build AIX
notroot@somehost:/home/notroot# oslevel -s
6100-06-05-1115
notroot@somehost:/home/notroot# which nmon
/usr/bin/nmon
notroot@somehost:/home/notroot# lslpp -w /usr/bin/nmon
File Fileset Type
----------------------------------------------------------------------------
/usr/bin/nmon bos.perf.tools File
notroot@somehost:/home/notroot# nmon -h
Hint: topas_nmon [-h] [-s <seconds>] [-c <count>] [-f -d -t -r <name>] [-x]
Command: TOPAS_NMON
-h FULL help information - much more than here
Interactive-Mode:
read startup banner and type: "h" once it is running
For Data-Collect-Mode (-f)
-f spreadsheet output format [note: default -s300 -c288]
optional
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-t spreadsheet includes top processes
-x capacity planning (15 min for 1 day = -fdt -s 900 -c 96)
For Interactive-Mode
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-g <filename> User decided Disk Groups
- file = on each line: group_name <hdisk_list> space separated
- like: rootvg hdisk0 hdisk1 hdisk2
- upto 32 groups hdisks can appear more than once
-b black and white [default is colour]
-B no boxes [default is show boxes]
example: topas_nmon -s 1 -c 100
For Data-Collect-Mode = spreadsheet format (comma separated values)
Note: use only one of f,F,z,x or X and make it the first argument
-f spreadsheet output format [note: default -s300 -c288]
output file is <hostname>_YYYYMMDD_HHMM.nmon
-F <filename> same as -f but user supplied filename
-r <runname> goes into spreadsheet file [default hostname]
-t include top processes in the output
-T as -t plus saves command line arguments in UARG section
-Y like -t but all commands with the same name are added up and reported
Note: you can have only one of -t, -T or -Y (last on the cmd line wins)
-s <seconds> between snap shots
-c <number> of refreshes
-w <number> Timestamp size (Tnnnn), values4 to 16, for analyser use 4 or 8
-l <dpl> disks/line default 150 to avoid spreadsheet issues. For EMC use 64
-g <filename> User decided Disk Groups (see above -g)
-d Include Disk Service time sections
-k <disklist> Only report these disks also works online (Example -k hdisk3,hdisk23,hdisk44)
-G Use UTC/GMT standard time (not local time)
-K Include RAW Kernel & LPAR sections (RAWLPAR & RAWCPUTOTAL)
-D Skip disk configuration sections
-E Skip ESS configuration sections
-J Skip JFS sections
-V Include disk Volume Group section
-P Include Paging Space section
-M Include MEMPAGES section = detailed memory stats per page size
-N Include NFS section, use -NN for NFSv4 stats.
-W Include WLM sections
-S Include WLM sections with SubClasses
-^ Include Fibre Channel (FC) sections
-O Include Shared Ethernet Adpater (SEA) VIOS only sections
-L Include LARGE page section
-I <percent> Ignore process percent threshold (default 0.1)
don't save TOP stats if proc using less CPU than this %
-A Include Async I/O Section
-m <dir> nmon changes to this directory before saving data to a file
-Z <priority> set nice priority -20=important to 20=unimportant (negative only for root user)
example: collect for 1 hour at 30 second intervals with top procs
topas_nmon -f -t -r Test1 -s30 -c120
To load into a spreadsheet like Lotus 1-2-3:
sort -A *nmon >stats.csv
transfer the stats.csv file to your PC
Start 1-2-3 and then Open <char-separated-value ASCII file>
Capacity planning mode - use cron to run each day
-x sensible spreadsheet output for CP = one day
every 15 mins for 1 day ( i.e. -ft -s 900 -c 96)
-X sensible spreadsheet output for CP = busy hour
every 30 secs for 1 hour ( i.e. -ft -s 30 -c 120)
Set-up and installation
To enable disk stats as root: chdev -l sys0 -a iostat=true
- this adds the disk % busy numbers (otherwise they are zero)
If you have hundreds of disk this can take 1% to 2% CPU
Interactive Mode Commands
key --- Toggles to control what is displayed ---
h = Online help information
r = Resources pSeries type, machine name, cache details and AIX version + LPAR
p = Partitions stats
c = CPU by processor stats with bar graphs
#=toggle PURR based stats (POWER5/6 shared CPUs only)
C = CPU by processor stats for high numbers of CPU
l = long term CPU (over 75 snapshots) with bar graphs
m = Memory and Paging stats
M = Multiple Page Size stats in pages - 2nd time in MB's
k = Kernel Internal stats
n = Network stats
= = For Network & Disk Toggle KB/s to MB/s
O = Shared Ethernet Adapter VIOS only
N = NFS Network File System stats (2nd N gets you NFSv4)
d = Disk I/O Graphs (see -k to limit to specific disks)
D = Disk I/O Stats - multiple times gets you more stats
o = Disk I/O Map (one character per disk showing how busy it is)
g = Disk Group I/O Stats (have to use -g commandline option)
a = Adapter I/O Stats
^ = Fibre Channel Adapter via fcstat command
[ = Start an On demand nmon recording
] = Stop an On demand nmon recording
e = ESS vpath Logical Disk I/O Stats
V = Disk Volume Group stats
P = Paging Space stats
j = JFS Stats
t = Top Process Stats 1=Basic-Details 2=Accumulated-CPU
Performance sorted by 3=CPU 4=Size 5=I/O
u = Top but with command arguments shown (used with 3,4 & 5)
to refresh arguments (for new processes) hit u twice
U = as u plus Workload Classes/WPAR
W = Workload Management (WLM) Stats
S = WLM with SubClasses
w = use with top to show AIX wait processes (good for SMP)
A = Summarise Async I/O (aioserver) processes
v = Verbose this highlights problems on the machine and
categorises them as either danger, warnings or OK
b = black and white mode (or use -b option)
. = minimum mode i.e. only busy disks and processes
~ = switch to topas screen
key --- Other Controls ---
+ = double the screen refresh time
- = halves the screen refresh time
q = quit (also x)
0 = reset peak counts to zero (peak = ">")
space = refresh screen now
Startup Control
If you find you always type the same toggles every time you start
then place them in the NMON shell variable. For example:
export NMON=cmdrvtan
Others:
a) Do you want to stop nmon - kill -USR2 <nmon-pid>
b) Use -p and nmon outputs the background process pid
c) To limit the processes nmon lists (online and to a file)
Either set NMONCMD0 to NMONCMD63 to the program names
or use -C cmd:cmd:cmd etc. example: -C ksh:vi:syncd
d) To limit the disks nmon lists up to 64 (online only)
Use -k diskname,diskname,diskname (Example -k hdisk2,hdisk0,hdisk3)
As said, even in other combinations of switches etc., like without -C etc. there was not the output I expected.
For Nagios the guideline says, that you should, if possible not use temporary files. So I wanted to keep it as tight as possible. Sure, there is not that much of a value using a variable to substitute a filename at 2 positions. I remove it as I don't want to leave any "rubbish" there.
On the other hand, the plugin-handling of GroundWork (that's on-top of Nagios) is just a download from a httpd and doesn't cleanup anything anyway on the clients where the plugins run.
I read about the named pipes on the IBM help page, but I think this way it is ok, thanks.
Writing a pipe means never having name, space, security or permission problems. A named pipe can have name and permission problems, I guess! Is there a way to direct the logging to an arbitrary file name? Does AIX have >(...) in ksh? The bash always has it, but if there is no /proc/##/dev/fd/# or the like in the O/S, bash uses a mknod named pipe in /var/tmp/ (and they pile up - no purge). The */fd/# pipes are the best, named but private, local and ephemeral.