How big is my awk array?

pondlife · September 10, 2007, 7:57am

Hi All,

I'm creating a script that goes through some csv files (output from sar) trying to get some statistics on system performance. This is the code I have so far:

 
awk -F"\",\"" 'NR != 1 {
                           per[$10]++
                           sum += $10
                   }
                   END {
                       print ARGV[ARGIND]
                       for (i in per)
                           print i" : "per
                       print "Total : "sum
                       print "Lines : "(NR -1)
                   }' $FILES

The input files have a header line which accounts for the "NR !=1" and "(NR - 1)".

What I need to do is normalise the data working out the 90th and 95th percentile (I haven't quite got to the math part yet - I'm going to try and calculate the cumulative frequency first... ). So what I believe I need to do next is sort my array - but I don't know how many elements are stored in it. Can anyone tell me how to do this?

Some example output here (from just one input file):

snsax-psmpw-5-2007-01-07_MemorySwapUtilisation.csv
9.63 : 1
9.64 : 20
9.6 : 1
9.65 : 29
9.66 : 17
9.87 : 11
9.78 : 1
9.89 : 1
9.61 : 38
9.62 : 24
Total : 1380.4
Lines : 143

Thanks in advance

Klashxx · September 10, 2007, 8:41am

Use a counter:

awk -F"\",\"" 'NR != 1 {
                           per[$10]++
                           sum += $10
                           if ( per[$10] == 1 )
                                c++
                   }
                   END {
                       print ARGV[ARGIND]
                       print "Array obj:",c
                       for (i in per)
                           print i" : "per
                       print "Total : "sum
                       print "Lines : "(NR -1)
                   }' $FILES

pondlife · September 10, 2007, 8:54am

Hi Klashxx,

It works - thanks!

Can you explain it to me because I'm a little confused... I can't understand why it works at any other time apart from the first pass - when the variable per[$10] is equal to one...

Many thanks.

p.

vgersh99 · September 10, 2007, 9:46am

or altrnatively.. change

                           per[$10]++
                           sum += $10
                           if ( per[$10] == 1 )
                                c++

TO

                           if ( !($10 in per) ) c++;
                           per[$10]++
                           sum += $10

Klashxx · September 10, 2007, 12:40pm

You're completely right vgersh99 ,but i thought the other piece was more clear for this particular case.

pondlife , you 're using an associative array ,where the index is the 10th element of your input and its value is a counter ,the number of times that the element is repeated.

Each element of the index is unique , so the first time you add a new entry to the array (new index) its counter value wil be 1

per[$10] = 1

On the other hand if the index exits the only variation will be the increase of the value of the counter so:

per[$10] >1

finally we can say that:

   if ( per[$10] == 1 )
      c++

Or:

if ( !($10 in per) ) c++

indicates that a new element was added to the array.

Hope this helps.

Regards