add number in lines line by line in different files

grandguest · June 29, 2010, 12:26pm

I have a set of log files that are in the following format

======= set_1 ========
counter : 315
counter2: 204597
counter3: 290582
======= set_2 ========
counter : 315
counter2: 204597
counter3: 290582
======= set_3 ========
counter : 315
counter2: 204597
counter3: 290582

Is there a good way (pure shell, sed, awk, etc.) to "add" the log files so that I will get a summary where the format is the same, but the numbers are the sum of the numbers in each line?

I'm waiting for some shell magic Otherwise I just have to write a c/c++ program or something...

panyam · June 29, 2010, 12:32pm

Some thing like this:

 
awk -F":" ' !/set/ { a[$1]=a[$1]+$2 } END { for ( i in a ) print i":"a}' logfile

Assuming logfile is a cat of logfiles*

To be specific

cat log_files* > file | awk -F":" ' !/set/ { a[$1]=a[$1]+$2 } END { for ( i in a ) print i":"a}' file

grandguest · June 29, 2010, 12:52pm

Nice elegant answer, but the array in awk does not keep the insertion order... so the resulting file does not follow the original order in the log files. Is is a way to keep track of the order as well?

Franklin52 · June 29, 2010, 1:57pm

If you have gawk:

WHINY_USERS=1 gawk -F: ' !/set/{ a[$1]+=$2 } END{ for(i in a) print i":"a }' log_files*

---------- Post updated at 19:57 ---------- Previous update was at 19:16 ----------

If you don't have gawk you can try this:

awk -F: ' 
!/set/ {
  if(!a[$1]){b[c++]=$1} 
  a[$1]+=$2 
}
END{
  for(i=0;i<c;i++) 
    print b ":" a
[b]}' log_files*

Use nawk or /usr/xpg4/bin/awk on Solaris if you get errors.

grandguest · June 29, 2010, 3:02pm

The gawk solution does not work, but the second solution works great. Thanks!

alister · June 29, 2010, 3:21pm

If it is possible for the value of a[$1] to be 0 (e.g., $0=="counter : 0"), then "!a[$1]" would create a second entry in b for $1 the next time a "counter :..." line is encountered, which would affect the number of times the line keyed by $1 appears in the output (though the sum would be correct). Perhaps, "$1 in a" would be better.

Regards,
Alister

grandguest · June 29, 2010, 3:30pm

I discovered another problem: I really want a line by line sum, and there is a problem if one of my counters' name is duplicated. So I end up with the following script (not a slick one-liner, but gets the job done):

#!/bin/zsh
num_of_lines=`grep -n -i '^=*' $1 | sed 's/:.*$//'`

for linenum in `seq $num_of_lines -1 1`; do rm file2; for i in $*; do tail -n $\
linenum $i | head -n 1 >>! file2; done && awk -F":" '{b=$1; a=a+$2} END {printf\
 "%s: %12.0f\n",b,a}' file2; done