Efficient awk way to add numbers in line fields

data.now:

blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=92 Auto_save__of__retention__data__completed=1 Warning___Return=68 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1
blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=93 Auto_save__of__retention__data__completed=1 Warning___Return=68 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1
blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=92 Auto_save__of__retention__data__completed=1 Warning___Return=8 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1
blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=99 Auto_save__of__retention__data__completed=1 Warning___Return=68 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1
blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=75 Auto_save__of__retention__data__completed=1 Warning___Return=38 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1
blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,on__host=93 SERVICE__ALERT_=51 Warning___The__results__of__service=16 Warning___on__host=12 Auto_save__of__retention__data__completed=1 Warning___Return=28 PASSIVE__SERVICE__CHECK_=53 ,1026--1313,1

The fields in each line above are separated by "comma". What im interested in is field 9.

Field 9 has a number of values.

What i want to do is two parts:

  1. Be able to add up all the values of a specific pattern in all the lines in the datafile. For instance, if i want to know the total value "on__host" on all lines in the data file.

  2. Be able to add up all the values from all the patterns on each line, on all the lines in the log.

There was an old data i was working with which only had a single value in the 9th field. that was easy to handle. The data looked like this:

data.prev

blah1,dah,blaha,sweet,games.log,5297484456,nagios-toin,529748456,53,1026--1313,1

And all I needed to do to add up the values in the 9th field of all the lines was:

awk -F, 'BEGIN{sum=0} {sum+=$9} END {print sum}' data.prev

The kind of script that would solve this problem for me would look like this:

#!/bin/sh
Pattern=$1
if [ "${Pattern}" = "allpatterns" ] ; then
   awk should add up all the values in the 9th field of data in data.now
else
   if the user did not specify "allpatterns", then, awk should take the pattern name specified by the user and use that to decide which pattern to add up in the 9th field of all the lines.
    awk -F"," '$9 ~ /'${Pattern}'/ '{do awk magic}'  -- this is just an idea.
fi

Try using the = as the field sep.

adding whole line

awk -F '='  { for (sum=0, i=2; i++; i<=NF) {
                         sum += int($(i)  )
                         if(i==NF) {print sum}       } 
           } ' somefile

You should try to modify the code above to print line field numbers and then use the END{} function to show the sum of all the lines.

Hi SkySmart,
As you know well, it always helps those of us who want to help you if we know (1) what operating system and shell you're using, and (2) if you actually show us the output you are hoping to produce from the sample input you provided.

Without (1) you are likely to have people who want to help you waste their time on suggestions that won't work in your environment. Please tell us what operating system and shell you're using.

Without (2), especially with the sparse description of what you are trying to do, you leave people guessing at what output you're trying to produce. Please show us the output you are hoping to produce from your sample input file.

sorry. i thought i provided more than enough information in my original post. but i have no issues providing more.

this script is expected to run on all unix systems. The shell i'm using is /bin/sh.

For the first part of my request, if all the values of all the patterns on all the lines in the data file are added up, the output should be just the resulting sum of the numbers..i.e 504 (just an arbitrary number i picked).

For the second part of my request, if the values of a specific pattern are added up, the output should be just the resulting sum of the values for that specific pattern...i.e. "on__host=400".

Hi try something like:

awk -F, -v pat="$Pattern" '
  {
    n=split($9,F," ")
    for(i=1; i<=n; i++) {
      if(F~"^" pat) {
        split(F,VAL,/=/)
        t+=VAL[2]
      }
    }
  }

  END {
    mess=pat "=" t
    sub(/^=/,x,mess)
    print mess
  }
' somefile

If $Pattern is empty it just renders a total of all fields in $9

1 Like

this worked beautifully. i modified it so that it allows me to choose which lines i want it to read, but instead of showing me the total of the numbers, its printing out all the lines.

What is wrong with the modified code below:

awk -F, -v pat="$Pattern" 'BEGIN{t=0} /'Sun' 'Oct' '15' '22':[0-9][0-9]:[0-9][0-9] '2017',/
  {
    n=split($9,F," ")
    for(i=1; i<=n; i++) {
      if(F~"^" pat) {
        split(F,VAL,/=/)
        t+=VAL[2]
      }
    }
  }

  END {
    mess=pat "=" t
    sub(/^=/,x,mess)
    print mess
  }
' somefile

the above code scans the specific lines i want, but instead of calculating the numbers in field 9, it prints them (the lines) out, and prints an incorrect total number at the end.

i.e. results:

0,data,Sun Oct 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148       
0,data,Sun Oct 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448  
0,data,Sun Oct 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100
5

notice the 5 at the end. not sure where its getting 5 from.

Your code prints the lines for 10pm on Sunday, October 15, 2017 (from the 2nd statement in the first line of your awk code. I assume that the value you seem to want from those lines total up to zero. But the calculations in your script (from the statement on lines 2 through 10 in your awk script) are performed on every line in the file; not just those that you print (since the condition on that statement is the empty expression).

Again, you say the output is not what you want, but do not clearly show us what output you are trying to produce.

And, you have not shown us the input that you are feeding this script. So, we have no idea where the 5 is coming from either. If your input file contained the line:

,,,,,,=5

plus the first three lines shown in your output and $pattern expands to an empty string, you would likely get the output you showed us.

1 Like

Without knowing the pattern nor the input file, it's difficult to explain that behaviour, esp. if you say it "worked beautifully" before. Was that with the same data?

maybe if i word this differently, it'll be clearer.

the data file that the awk code is scanning as many lines similar to this (and they all have different time stamps on them):

0,data,Sun Oct 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148       
0,data,Sun Oct 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448  
0,data,Sun Oct 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100

the lines i care about, in this particular scenario are all lines that have the 10PM timestamp on them.

ok, so when the awk code finds those lines that contain 10PM on them, it should do what my original request in the original post of this thread was about....which is to add up all the numbers in the 9th field, provided a specific pattern was not specified.

if a specific pattern to search for in field 9 was not provided, then the "$pattern" variable would be empty. And yes, in this case, it is empty. i did not specify a pattern.

so

 pattern=""

i hope its clear now.

OK.

So now you have shown us three out of an unknown number of lines in your input file and you have told us how pattern is set.

Is it really that hard for you to actually show us a few more lines of input that would allow us to test code that might meet your needs so we know that we are correctly extracting only the desired lines from input files AND show us the output you want your script to produce?

all the lines will look similar to this:

0,data,Sun Aug 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148
0,data,Sun Aug 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448
0,data,Sun Aug 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100
0,data,Sun Sep 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148
0,data,Sun Sep 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448
0,data,Sun Sep 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100
0,data,Sun Oct 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148
0,data,Sun Oct 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448
0,data,Sun Oct 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100
0,data,Sun Nov 15 22:01:23 2017,6906,/var/log/catalina.out,524K,data,529086,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6894 6906,148       
0,data,Sun Nov 15 22:06:23 2017,6906,/var/log/catalina.out,524K,data,529086,emxCommonAppInitialization__Error_while_initializing=0  INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0  java_lang_NoClassDefFoundError=0  java_lang_OutOfMemoryError=0  Master_Item_Service_is_down=0  The_file_or_directory_is_corrupted_and_unreadable=0,6906 6906,448  
0,data,Sun Nov 15 22:11:23 2017,6914,/var/log/catalina.out,524K,data,529580,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0 ,6906 6914,100

The search pattern must be before the opening brace, not on the line above it (otherwise it is seen by awk as two separate condition-action sequences):

awk -F, -v pat="$Pattern" '
  BEGIN {
    t=0
  }

  /Sun Oct 15 22:[0-9][0-9]:[0-9][0-9] 2017,/ {
[..]

And you should leave out al those single quotes..

--
Note: this is also what Don Cragun mentioned in post #7

1 Like

seems to be working now.

thanks all!!!