Storing two dimensional array for postprocessing

Hi Community,

Would love to get some quick help on below requirement.

I am trying to process mpstat output from multiple blades of my server
I would like to assign this the output to an array and then use it for post processing. How can I use a two dimensional array and assign these value

Desired output

        cpuusage(<CPU NO>,<Type>)

e.g.: cpuusage(2,irq) will return 0.04

mpstat -P ALL
Linux 3.0.101-0.15.1.6550.0.PTF-default (oam)  11/07/16        _x86_64_

14:58:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
14:58:33     all    0.07    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.82
14:58:33       0    0.14    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.75
14:58:33       1    0.07    0.00    0.07    0.00    0.00    0.01    0.00    0.00   99.85
14:58:33       2    0.07    0.00    0.12    0.00    0.00    0.04    0.00    0.00   99.77
14:58:33       3    0.22    0.00    0.20    0.00    0.00    0.05    0.00    0.00   99.53
14:58:33       4    0.12    0.00    0.15    0.00    0.00    0.01    0.00    0.00   99.72
14:58:33       5    0.11    0.00    0.18    0.00    0.00    0.02    0.00    0.00   99.69
14:58:33       6    0.08    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.88
14:58:33       7    0.07    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.88
14:58:33       8    0.07    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.89
14:58:33       9    0.12    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.83
14:58:33      10    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
14:58:33      11    0.03    0.00    0.09    0.00    0.00    0.04    0.00    0.00   99.85
14:58:33      12    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.94
14:58:33      13    0.07    0.00    0.03    0.00    0.00    0.00    0.00    0.00   99.90
14:58:33      14    0.03    0.00    0.40    0.00    0.00    0.18    0.00    0.00   99.39
14:58:33      15    0.05    0.00    0.10    0.00    0.00    0.00    0.00    0.00   99.84
14:58:33      16    0.07    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.91
14:58:33      17    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
14:58:33      18    0.02    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
14:58:33      19    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95

WHERE do you want to create that array (shell, text utility e.g. awk , a c programme)? How to pass it to post processing?

Why does cpuusage(2,irq) return 0.04 and not 0.00 ?

I am trying to use it in a shell script

Post processing is like a Performance Management File.

I sample CPU usage every 5mins, then average it for a 15min period. Store the stats into a file

Yes, sorry its my mistake. It should look like

cpuusage(2,soft) = 0.04

You failed to mention the shell you use. In case it's bash , try "associative arrays":

declare -A cpuusage
{ read
  read
  read -a HD
  while read -a TMP
    do for i in ${!HD[@]}
         do cpuusage[${TMP[1]}","${HD[$i]}]=${TMP[$i]}
         done
    done
} < file
echo ${cpuusage[2,%soft]}
0.04

I wonder if your idea makes sense at all. According to man mpstat the reported values are average since the system boot - unless you give an interval.
And if you want the average over all CPUs, consider vmstat or iostat -c (again, with an interval).

Looks like my bash shell does not have support for associative arrays ?

 
# more a.sh 
#!/bin/bash

declare -A cpuusage
{ read
  read
  read -a HD
  while read -a TMP
    do for i in ${!HD[@]}
         do cpuusage[${TMP[1]}","${HD[$i]}]=${TMP[$i]}
         done
    done
} < x
echo ${cpuusage[2,%soft]}
#
# more x
Linux 3.0.101-0.15.1.6550.0.PTF-default (SC-1)  11/08/16        _x86_64_

09:30:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
09:30:09     all    0.07    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.82
09:30:09       0    0.14    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.75
09:30:09       1    0.07    0.00    0.07    0.00    0.00    0.01    0.00    0.00   99.85
09:30:09       2    0.07    0.00    0.12    0.00    0.00    0.04    0.00    0.00   99.77
09:30:09       3    0.22    0.00    0.20    0.00    0.00    0.05    0.00    0.00   99.53
09:30:09       4    0.12    0.00    0.15    0.00    0.00    0.01    0.00    0.00   99.72
09:30:09       5    0.11    0.00    0.18    0.00    0.00    0.02    0.00    0.00   99.69
09:30:09       6    0.08    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.88
09:30:09       7    0.07    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.88
09:30:09       8    0.07    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.89
09:30:09       9    0.12    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.83
09:30:09      10    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      11    0.03    0.00    0.09    0.00    0.00    0.04    0.00    0.00   99.85
09:30:09      12    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.94
09:30:09      13    0.07    0.00    0.03    0.00    0.00    0.00    0.00    0.00   99.90
09:30:09      14    0.03    0.00    0.40    0.00    0.00    0.18    0.00    0.00   99.39
09:30:09      15    0.05    0.00    0.10    0.00    0.00    0.00    0.00    0.00   99.84
09:30:09      16    0.07    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.91
09:30:09      17    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      18    0.02    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      19    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
#
#
#./a.sh 
./a.sh: line 3: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
./a.sh: line 9: all,09: value too great for base (error token is "09")
./a.sh: line 10: 2,%soft: syntax error: operand expected (error token is "%soft")
# 

Associative arrays are as of bash 4 . If that is not available on your system, you could try ksh93 or zsh with a bit of syntax adjustment ( typeset instead of declare , read -A instead of read -a , ...)

$ bash -c 'declare -A cpuusage; echo $?'
bash: line 0: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
2
$ bash4 -c 'declare -A cpuusage; echo $?'
0
$ 

Some improvement, but still fails with ksh93

# more a.ksh 
#!/bin/ksh93

typeset cpuusage
{ read
  read
  read -A HD
  while read -A TMP
    do for i in ${!HD[@]}
         do cpuusage[${TMP[1]}","${HD[$i]}]=${TMP[$i]}
         done
    done
} < x
echo ${cpuusage[2,%soft]}
#
#./a.ksh 
./a.ksh: line 9: :: invalid character in expression - all,09:30:09
./a.ksh: line 13: 2,%soft: arithmetic syntax error
#
#
#
# more x 
Linux 3.0.101-0.15.1.6550.0.PTF-default (SC-1)  11/08/16        _x86_64_

09:30:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
09:30:09     all    0.07    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.82
09:30:09       0    0.14    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.75
09:30:09       1    0.07    0.00    0.07    0.00    0.00    0.01    0.00    0.00   99.85
09:30:09       2    0.07    0.00    0.12    0.00    0.00    0.04    0.00    0.00   99.77
09:30:09       3    0.22    0.00    0.20    0.00    0.00    0.05    0.00    0.00   99.53
09:30:09       4    0.12    0.00    0.15    0.00    0.00    0.01    0.00    0.00   99.72
09:30:09       5    0.11    0.00    0.18    0.00    0.00    0.02    0.00    0.00   99.69
09:30:09       6    0.08    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.88
09:30:09       7    0.07    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.88
09:30:09       8    0.07    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.89
09:30:09       9    0.12    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.83
09:30:09      10    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      11    0.03    0.00    0.09    0.00    0.00    0.04    0.00    0.00   99.85
09:30:09      12    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.94
09:30:09      13    0.07    0.00    0.03    0.00    0.00    0.00    0.00    0.00   99.90
09:30:09      14    0.03    0.00    0.40    0.00    0.00    0.18    0.00    0.00   99.39
09:30:09      15    0.05    0.00    0.10    0.00    0.00    0.00    0.00    0.00   99.84
09:30:09      16    0.07    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.91
09:30:09      17    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      18    0.02    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
09:30:09      19    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95

Hi Scrutinizer,
With bash , you declare an associative array with:

declare -a array_name

but with ksh93 , you use:

typeset -A array_name

Hi sshark,
You're close, but there are two small problems:
Change:

typeset cpuusage

to:

typeset -A cpuusage

and change:

echo ${cpuusage[2,%soft]}

to:

echo ${cpuusage["2,%soft"]}

and then your script will do what I think you want it to do.

Hi Don,

Now the script works with your recommended changes

I do have a question though. The script we have drafted feeds input from a file.

However how could I directly feed the output of

mpstat

command into the script

Will this work ?

#!/bin/ksh93

typeset -A cpuusage
{ read
  read
  read -A HD
  while read -A TMP
    do for i in ${!HD[@]}
         do cpuusage[${TMP[1]}","${HD[$i]}]=${TMP[$i]}
         done
    done
} < `mpstat -P ALL`

echo ${cpuusage["14,%soft"]}

Lets say if I run this script every 5mins, Is it possible to increment the values of cpuusage array to add to its previous value and store it . Then I can easily average it based on the number of sampling at the final sampling period ?

No. The operand to the redirection operator (i.e. < ) is the name of a file to be opened for reading. The output from running the mpstat utility is not the pathname of a file. You could, however, change:

} < `mpstat -P ALL

to:

} <<< `mpstat -P ALL`

or, using the non-obsolete form of command substitution:

} <<< $(mpstat -P ALL)

Of course. Arithmetic expansions work just fine as the right hand side of an assignment command:

cpuusage[${TMP[1]}","${HD[$i]}]=$(( ${cpuusage[${TMP[1]}","${HD[$i]}]} + ${TMP[$i]} ))

Thanks Don, I am almost there. Now I am able to feed the mpstat output.

You suggestion to second question is still not working as per the requirement.

Let me output the contents

Linux 3.0.101-0.15.1.6550.0.PTF-default (SC-1)  11/08/16        _x86_64_

22:56:12     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
22:56:12     all    0.07    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.82
22:56:12       0    0.14    0.00    0.09    0.00    0.00    0.02    0.00    0.00   99.75
22:56:12       1    0.07    0.00    0.07    0.00    0.00    0.01    0.00    0.00   99.85
22:56:12       2    0.07    0.00    0.12    0.00    0.00    0.04    0.00    0.00   99.77
22:56:12       3    0.22    0.00    0.20    0.00    0.00    0.05    0.00    0.00   99.53
22:56:12       4    0.12    0.00    0.15    0.00    0.00    0.01    0.00    0.00   99.72
22:56:12       5    0.11    0.00    0.18    0.00    0.00    0.02    0.00    0.00   99.69
22:56:12       6    0.08    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.87
22:56:12       7    0.07    0.00    0.05    0.00    0.00    0.00    0.00    0.00   99.88
22:56:12       8    0.07    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.89
22:56:12       9    0.12    0.00    0.04    0.00    0.00    0.00    0.00    0.00   99.83
22:56:12      10    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
22:56:12      11    0.03    0.00    0.09    0.00    0.00    0.04    0.00    0.00   99.85
22:56:12      12    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.94
22:56:12      13    0.07    0.00    0.03    0.00    0.00    0.00    0.00    0.00   99.90
22:56:12      14    0.03    0.00    0.40    0.00    0.00    0.18    0.00    0.00   99.39
22:56:12      15    0.05    0.00    0.10    0.00    0.00    0.00    0.00    0.00   99.84
22:56:12      16    0.07    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.91
22:56:12      17    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
22:56:12      18    0.02    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95
22:56:12      19    0.03    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.95

As you can see, when we store the CPU usage in cpuusage array, I only need to keep the CPU usage accumulated (add the reading to the previous reading). You suggestion to add using below command even tries to add first column and CPU#, which is not required

cpuusage[${TMP[1]}","${HD[$i]}]=$(( ${cpuusage[${TMP[1]}","${HD[$i]}]} + ${TMP[$i]} ))

It always helps to experiment and do some reading, e.g. man pages . For your lastmost problem, start the evaluation/calculation only at the third column. Try

{ read
  read
  read -a HD
  while read -a TMP
     do for (( i=2; i<${#HD[@]}; i++ ))
          do ((cpuusage[${TMP[1]}","${HD[$i]}]+=${TMP[$i]}))
          done
     done
 } < file

Please note that this is tested in bash using integers only as bash unlike ksh doesn't offer shell floating number arithmetics.

I assumed that you had written the code you had shown us and understood what it was doing. I showed you a syntax for adding fields and assumed that you would be able to modify your own code to do straight assignments or to skip assigning anything at all for data in columns you didn't want to sum.

Assuming that you don't need to keep any data from the first two columns of your data, the code RudiC suggested above should also work just fine in a recent Korn shell as long as you change the read -a to read -A in both of the places where it occurs.

Thanks Don & RudiC

I was able to modify the script and able to get the desired behavior. Below is my new script

#!/bin/ksh93

typeset -A cpuusage
{  read -A HD
  while read -A TMP
    do for i in ${!HD[@]}
        do
         cpuusage[${TMP[0]}","${HD[$i]}]=$(( ${cpuusage[${TMP[0]}","${HD[$i]}]} + ${TMP[$i]} ))
        done
    done
} <<< $(mpstat -P ALL | awk '{ $1=""; print $0 }' | awk 'NR>=3')

mpstat -P ALL | awk '{ $1=""; print $0 }' | awk 'NR>=3'
echo ${cpuusage["14,%sys"]}

Now moving to my next step.

Question:
I am using this cpuusage array within a while loop and repeating it for a pre-defined period. Now I want to use this cpuusage array and extract values outside the while loop and save it into a file. Looks like the scope of cpuusage is not available outside the while loop. Below is the snippet of my main script

		while true
			do
					echo "Sampling started"
					
						typeset -A cpuusage
						{  read -A HD
						  while read -A TMP
							do for i in ${!HD[@]}
								do
								 cpuusage[${TMP[0]}","${HD[$i]}]=$(( ${cpuusage[${TMP[0]}","${HD[$i]}]} + ${TMP[$i]} ))
								done
							done
						} <<< $(mpstat -P ALL | awk '{ $1=""; print $0 }' | awk 'NR>=3')

						echo "sampling done"
						
						if [ $SCANTIME -eq $GRANULARITYPERIOD ]
						then
								break 2
						fi

						sleep $SCANTIME
						SCANTIME=$((SCANTIME+5))
						echo "------------------"
						echo "SCANNING TIME: $SCANTIME"

			done

		echo "TRYING TO PROCESS OUTPUT"

			for CPUCORE in [0..19]
			do
				cat >> $pmFilename <- EOM
											<r p="1">`echo "scale=2; ${cpuusage["$CPUCORE,%iowait"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="2">`echo "scale=2; ${cpuusage["$CPUCORE,%irq"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="3">`echo "scale=2; ${cpuusage["$CPUCORE,%nice"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="4">`echo "scale=2; ${cpuusage["$CPUCORE,%soft"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="5">`echo "scale=2; ${cpuusage["$CPUCORE,%sys"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="6">`echo "scale=2; ${cpuusage["$CPUCORE,%idle"]}/3" | bc -l | sed 's/^\./0./'`</r>
											<r p="7">`echo "scale=2; ${cpuusage["$CPUCORE,%usr"]}/3" | bc -l | sed 's/^\./0./'`</r>
				EOM
			done

---------- Post updated 11-09-16 at 12:35 AM ---------- Previous update was 11-08-16 at 05:49 PM ----------

Got it sorted, my for loop was incorrect :frowning:

for CPUCORE in {0..19}

Now that you bring awk into play, did you consider to use it for the overall solution, i.e. do the entire thing in one single awk script?

I would replace

cpuusage[${TMP[0]}","${HD[$i]}]=$(( ${cpuusage[${TMP[0]}","${HD[$i]}]} + ${TMP[$i]} ))

by

cpuusage["${TMP[0]},${HD[$i]}"]=$(( cpuusage["${TMP[0]},${HD[$i]}"] + ${TMP[$i]} ))

And

$(mpstat -P ALL | awk '{ $1=""; print $0 }' | awk 'NR>=3')

by

"$(mpstat -P ALL | awk 'NR>=3 { $1=""; print }')"

And

echo "scale=2; ${cpuusage["$CPUCORE,%sys"]}/3" | bc -l | sed 's/^\./0./'

by ksh-builtins

printf "%.2f\n" $(( cpuusage["$CPUCORE,%sys"]}/3.0 ))

Still I doubt there will be any real profit - besides the good practical excercise!

RudiC - Not really. You have seen how good I am in the scripting :slight_smile:
Attempting this would be another big exercise. I will look at it once I finish this script

MadeInGermany - Thanks for the recommendations, I will adopt your changes

------

Another question related to how to initiate execution of this script that I am preparing

  1. I do not want to use cronjob as we do not have permission to use
  2. I would like to start the script on the next possible quarter hour
    e.g. Current time is 11:46. If I run the script, it should pause initially and continue executing from 12:00

How can I implement this without using cronjob

Do you have the at utility available?

And, please post the desired output from a (reduced: 2 - 3 cpus, 2 - 3 time points) input sample.

Yes, I do have at utility in my system.

Sorry, did not understand whether you need the output of the script or the input to the script. i.e

mpstat 

command output