Question about details for the whole machine

Hi folks,

As I continue my self-torture [ >:) ], I've come on an interesting issue.
I now have a script that uses top on a Solaris box to gather performance data into a file for use in tracking over all performance.

And it even works 99.99% of the time.

But it glitches eventually and leaves a process still running and burning cpu cycles.

The unix admin who has been "helping me" just smiled as if he expected that and said I should use utilities which are native to Solaris.

Well, what I need from top are:

  • the load averages
  • the CPU Idle state data
  • and the memory
    from the displayed results below:
   last pid: 25033;  load avg:  0.11,  0.77,  0.59;  up 21+07:05:28       16:25:24
   104 processes: 103 sleeping, 1 on cpu
   CPU states: 96.6% idle,  1.5% user,  2.0% kernel,  0.0% iowait,  0.0% swap
   Kernel: 1181 ctxsw, 28 trap, 1072 intr, 1549 syscall, 26 flt
   Memory: 10G phys mem, 4408M free mem, 9083M total swap, 9083M free swap

Now I can get the load averages easy with:
# prstat 1 1 | grep load | awk '{print $8, $9, $10}'

Sadly, prstat only gives most of it's data 'per process' where I need amalgamated data for the entire box.(as top gives)

When I investigated getting the CPU and memory data from vmstat, I could find no method of getting the values for "iowait" or "swap". In my investigations, the man page says there should also be "wa" value which I do not get form my vmstat as shown:

# vmstat 1 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s3 --   in   sy   cs us sy id
 3 0 0 11893120 5017848 7 46 1  0  0  0  0  0  2 -0  0  934 1029 1080 16  1 82

As for memory, the only values I see in vmstat are Swap and Free. Nothing about Physical memory, free or otherwise.

I investigated using sar, but our customer has it disabled and does not wish to enable it.

I am looking into my options, which are broadening as I look at things like "kstat", which i do not know how to use, and mpstat (using the -a flag). But this also does not get me a complete set of the data that top does.

On the basis that this is just a box following instructions, I am assuming there is another way to get what top does from a Solaris box and I am hoping that someone here can give me some direction in getting it?

I've even looked at the source for vmstat(vmstat.c) in order to see if I can figure out what it does[and no, I do not prog in c at all] and that led me to a bunch of ".h" files I'm exploring as I burn more and more work time.

Can anyone guide me to an answer for solaris?

Thanks and sorry for rambling.

Marc

You might want to use "top -Z" which gives some statistics per zone.

That's no surprise.

  • "iowait" ceased to be reported by Solaris many years ago being quite confusing, meaningless and commonly misinterpreted.
  • There is no "swap" CPU state so it should always display 0% here.

vmstat "free" column is definitely about physical memory. the "swap" column is not that much related to what top reports. "swap" means here available virtual memory. What top reports is the swap area usage which you can get with "swap -l" on Solaris.

Most of the statistic gathering commands (eg. vmstat, iostat, mpstat, netstat, ...) are using the kstat interface to get part or all of their input data. The kstat command allows to get the low level data from which they build a more readable representation.

No command will. "top" is gathering data from different sources (mostly kstat and /proc) and consolidating them its way.

That's a wrong assumption if you expect a single alternative tool that provides the same set of statistics. If you want top just use it and fix whatever doesn't work in the way you call it. Otherwise, you'll have to aggregate data from different commands or process kstat output, if you are not interested in process specific information.

2 Likes

Hi jlliagre!

First, thanks for responding!

So taking from your response,
The "iowait" and "swap" CPU state are items I should not worry about

I can use the vmstat "free" column for "Free" physical memory, but how do I test
for the total Physical memory?

I have checked and like the results from "swap -l". Thank you!

In the end, I don't want one single utility to do this. Or rather, once I understand how to get what I want I can build my script which will be the "one single utility" I use and can share.

Ultimately, the output needs to be:
Box Name, Mar-20-13,10:45:05,97.5%,10G,4473M,9083M,9083M,0.06, 0.05, 0.05
Where:
97.5% = CPU Idle [ This I still need * See below* ]
10G = Total phys memory [ This I can get from "vmstat 1 1" per your reply ]
4473M = Phys Memory in use [ This I still need ]
9083M = total Swap [ This and the next I can get from "swap -l" per your reply ]
9083M = Swap in us (this is a lab machine for design)
0.06, 0.05, 0.05 is the cpu load [This I can get from "prstat 1 1"]

Regarding the CPU Idle,
Top shows "97.5%"
where vmstat shows "83", being 83 percent.

Is this just an artifact of top not being accurate in a Solaris OS or is there something I am missing?

And on the "Phys Memory in use", how do I get that?
I am researching this, but came here when I found myself trying to read up on vmstat, prstat, kstat and several other things all at the same time. If I had one path, I could walk it myself. But I need direction on which to use or I'll be balancing on a toe each as I walk many paths rather than walking one with both feet

Thanks for your help.

Marc

---------- Post updated at 12:01 PM ---------- Previous update was at 11:06 AM ----------

An additional discovery I've made is that:
vmstat 1 1
gives a "free" of 5001808
where
kstat -n system_pages | grep availrmem
gives a "availrmem" of 1207874

So as I continue to research, I am finding that each of the roads I am walking give answers which appear similar but are vastly different

Marc

---------- Post updated at 04:43 PM ---------- Previous update was at 12:01 PM ----------

With help from you folks and my own research I found:

/usr/sbin/prtdiag | grep "Memory size" | awk '{print $3}'
gets me Phys memory size

/usr/bin/vmstat 1 1 | grep -v free | grep -v faults | awk '{print $5}'
gets me Phys Memory free

prstat 1 1 | grep load | awk '{print $8, $9, $10}'
gets: me the load averages

/usr/sbin/./swap -l
gets:me the swap size and free

/usr/bin/vmstat 1 1 | grep -v swap | awk '{print $22}'| sed '/^$/d'
gets me the CPU Idle

Thanks for the guidance!!!!

Marc

You are missing the first set of statistics reported by most of the *stat commands is an average since last boot.

Instead of "vmstat 1 1", you should run "vmstat 1 2" and use the last line values.

The facts they are about different periods of time and reported with different units (KB vs 4 or 8 KB pages depending on the architecture) is something worth considering :wink:

  1. get a new top program that does not get stuck
  2. if it still goes stuck, call it through a timeout wrapper
perl -e "alarm 3600; exec @ARGV" top ...

This one will kill top after 1 hour.

ok....
I've been reading here and also doing a lot of research

As jlliagre says,

if I use the command:

"vmstat 1 2 | cut -b77-78 | grep -v id"

I get the following output:

85
99

now "85" is based on the period "since the device was booted"
so "99" is what I want

So how to I only grab the second value?

I tried a test like this:

   set data=`vmstat 1 2 | cut -b77-78 | grep -v id`
   echo $data

But that only hung at the prompt until the process ran, and gave me no data.

Thanks for your continuing help.

Marc

---------- Post updated at 02:52 PM ---------- Previous update was at 02:50 PM ----------

Actually, It continued to get stuck so I put a command after the top to determine if top was running and kill it.

That failed to resolve the issue to the sys admin's satisfaction.

---------- Post updated at 02:52 PM ---------- Previous update was at 02:52 PM ----------

Actually, It continued to get stuck so I put a command after the top to determine if top was running and kill it.

That failed to resolve the issue to the sys admin's satisfaction.

---------- Post updated at 03:53 PM ---------- Previous update was at 02:52 PM ----------

additionally...

When I run the following from the command line:
I get:

bash-3.2# vmstat 1 2 | grep -v id | cut -b77-78

85
98
bash-3.2#
  Note, the blank line is part of the output

but when I create a script: "test.sh" containing:

bash-3.2# cat test.sh
/usr/bin/ksh

vmstat 1 2 | grep -v id | cut -b77-78

and I run that, I get:

bash-3.2# ./test.sh
#

I then have to type "exit" to allow the script to complete execution as follows:

bash-3.2# ./test.sh
# exit

85
98
bash-3.2#

So the more I am trying the more I am both learning and digging my wheels in deeper. :frowning:

I'm hoping someone can help me out.

Marc

Assuming you want the last column (idle cpu), here is one way to get it:

vmstat 1 2 | nawk 'NR==4{print $NF}'

That would be quite inefficient though to only extract a single value while I understand you want several columns.

1 Like

Great!
I'll try that as soon as I get a chance.

Any idea on why the command I used: "vmstat 1 2 | cut b-77-78 | grep -v id"
was causing a hang up?

After I entered the command via a script, and before the cut and grep, it held up at a hash prompt {#} and then I had to enter "exit" and hit return.

Then it executed the cut and grep.

Very odd as the comment worked perfectly when entered manually on the command line

I think you want to get rid of the unwanted lines BEFORE you cut.

And I think it's going to be more reliable to get the data (id or whatever) with cut -f to get the field you want.

$ vmstat 1 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   3252  24724 416548 271716    0    0    80    18   26   50  0  0 99  1
 0  0   3252  24716 416548 271716    0    0     0     0   46   75  0  0 100  0
$ vmstat 1 2 | tail -2 | sed "s/  */ /g" | cut -d " " -f 16
99
100
1 Like

Actually, Where we are getting the "99" and the "100", I need the "100" only

In the end, I am working on a Solaris system where "top" is not native. And it's been...stuttering for lack of a better word.

So I'm trying to get at the idle value another way, and I only need the last value.

Marc

There is absolutely no reason for this command to hang.
The cause would be a bug in your script or an issue with your environment.
It would help if you post your whole script content.

---------- Post updated at 07:59 ---------- Previous update was at 07:55 ----------

Beware that this is the Solaris forum while your vmstat output is from Linux or similar. They are slightly different.
Also, you probably mean tail -1 , not tail -2 .

1 Like

well, here is the entire script as it's a test of the command functionality:

/usr/bin/ksh
vmstat 1 2 > /usr/home/mgrma/testFile
# vmstat 1 2 | nawk 'NR==4{print $NF}'
# vmstat 1 2 | awk '{print $22}' | awk '/[0-9]/'
# vmstat 1 2 | cut -b77-78 | grep -v id >> /usr/home/mgrma/testFile
# sed '2' /usr/home/mgrma/testFile

Each of the commented lines is a different method I have tried, including recommendations from here. With each try, I can run the command from the command line with no issue.

But when I put it into the script and run it, I get a "#" prompt and have to type "exit"

Once I do, there is the expected delay for vmstat to do it's thing and I get the correct data. As you can see, I even tried piping it to a file to process in later commands.

---------- Post updated at 07:59 ---------- Previous update was at 07:55 ----------

Beware that this is the Solaris forum while your vmstat output is from Linux or similar. They are slightly different.
Also, you probably mean tail -1 , not tail -2 .
[/quote]

as for linux, here is a uname output of the environment:

bash-3.2# uname -a
SunOS UsrBox 5.10 Generic_147440-27 sun4u sparc SUNW,Sun-Fire-V240
bash-3.2#

I am sudo to the bash user.

Marc

---------- Post updated at 11:19 AM ---------- Previous update was at 11:04 AM ----------

Never mind...

I'm a moron!

I was using "#!/usr/bin/ksh"

not "#!/bin/ksh"

Sorry for the confusion!

This wasn't directed to you but to hanson44.

Actually, #!/usr/bin/ksh would have been fine just like #!/bin/ksh , the real issue is you were lacking the shebang ( #! ).