Add values of similar patterns with awk

SkySmart · June 5, 2016, 2:48am

so my output is this:

session_closed=157
session_opened=151
session_closed=18
session_opened=17

there are two patterns here, but with different values. the two patterns are "session_opened" and "session_closed". i expect there will be many more other patterns.

what i want to do is whenever there's a duplicate of patterns, i want to add up all the numbers of each duplicate, so that the refined output looks like this:

session_opened=168
session_closed=175

i dont want to do a "uniq" here in case there are duplicates with the same exact number. i want to make sure I add up ALL the values of all the patterns.

i would like to do this in awk. i cant seem to come up with any ideas to get this done.

Don_Cragun · June 5, 2016, 4:22am

Do you mean something like:

awk -F'=' '
{	c[$1] += $2
}
END {	for(i in c) printf("%s%s%s\n", i, FS, c)
}' file

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

SkySmart · June 5, 2016, 10:42am

don cragun:

Do you mean something like:
awk -F'=' '
{	c[$1] += $2
}
END {	for(i in c) printf("%s%s%s\n", i, FS, c)
}' file
As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

this worked beautifully. anyway you can explain what each line is doing for me, please?? i know the first line is specifying the delimiter. but how are the numbers for each pattern being added. i do a lot of for loops like this in bash and i'd love to be able to do them all in awk.

thank you!

RavinderSingh13 · June 5, 2016, 12:55pm

Hello SkySmart,

Following may help you in same, please do let me knnow in case you have any queries on same.

awk -F'=' '                                         ##### Making = as a field seprator  here.
{	c[$1] += $2                                 ##### creating an array named C whose index is $1 abd value is $2, so += means add the same indexes values to it's previous values, so that we could get a total sum of same index($1, first field's) as per your requirement.
}
END {	for(i in c) printf("%s%s%s\n", i, FS, c) ##### Starting END block here, where starting a for loop in array c, so i is a vriable here and it will traverse through all items of array c, then printing the value of i(which is index pf array, you could say first field's value then printing FS(which is field seprator =) then printing the array c's current index(i)'s value by c.
}' file                                             ##### mentioning Input_file here.

Thanks,
R. Singh

MadeInGermany · June 5, 2016, 1:29pm

You count the $2 values per $1 value.
That means you need a variable per each $1, ideally this is a $1-addressed array. I.e. $1 is the array key.
And the array stores the sum of the $2 values, i.e. each $2 value is added to it.
Because it is unknown how many values are to be added, you need an END section to print the array keys and their values.

jgt · June 5, 2016, 3:58pm

There there is the old fashioned way when we only had 1k of memory.

sort <input >sorted
sub_total=0
grand_total=0
prev_desc=
first_pass=Y
while IFS="=" read desc amount
do
if [ $first_pass = "Y" ]
then
   first_pass="N"
   prev_desc=$desc
fi
if [ "$desc" -ne "$prev_desc" ]
then
  echo $prev_desc $sub_total
  final_total=`expr $grand_total + $sub_total`
  sub_total=0
  prev_desc=$desc
fi
sub_total=`expr $sub_total + $amount
done<sorted
echo $prev_desc $sub_total
grand_total=`expr $grand_total + $sub_total`
echo "Grand Total " $grand_total

Don_Cragun · June 6, 2016, 1:26am

jgt:

There there is the old fashioned way when we only had 1k of memory.

sort <input >sorted
sub_total=0
grand_total=0
prev_desc=
first_pass=Y
while IFS="=" read desc amount
do
if [ $first_pass = "Y" ]
then
   first_pass="N"
   prev_desc=$desc
fi
if [ "$desc" -ne "$prev_desc" ]
then
  echo $prev_desc $sub_total
  final_total=`expr $grand_total + $sub_total`
  sub_total=0
  prev_desc=$desc
fi
sub_total=`expr $sub_total + $amount
done<sorted
echo $prev_desc $sub_total
grand_total=`expr $grand_total + $sub_total`
echo "Grand Total " $grand_total

Even with an original Bourne shell and back in the days before test was a shell built-in, the -ne comparison operator was for comparing numeric values; not strings. For strings, the not equal comparison operator is and was != . And, I assume the final_total above in red was intended to be grand_total .

With POSIX conforming shells (and small input files), sort and shell might be faster than awk . You might want to try the following:

#!/bin/ksh
sub_total=0
grand_total=0
prev_desc=
sort file | while IFS="=" read desc amount
do	if [ "$desc" != "$prev_desc" ]
	then	printf '%s=%d\n' "$prev_desc" "$sub_total"
		grand_total=$((grand_total + sub_total))
		sub_total=0
		prev_desc=$desc
	fi
	sub_total=$((sub_total + amount))
done
printf '%s=%d\n' "$prev_desc" "$sub_total"
grand_total=$((grand_total + sub_total))
printf 'Grand Total=%d\n' "$grand_total"

Although written and tested using a Korn shell, the above script should work with any POSIX-conforming shell. And, with a 1993 or later version of the Korn shell, if you change all three occurrences of %d in the above script to %.2f , this script can also handle amounts presented with or without a decimal point and up to 2 digits after the decimal point (instead of just processing whole numbers).

MadeInGermany · June 6, 2016, 1:55am

Don, I think by piping into the while loop you force it into a sub shell, so at the end the variables are not updated!?
Only ksh derivates run the last part of a pipe (here the while loop) in the main shell.

Don_Cragun · June 6, 2016, 2:12am

Ouch. Yes, mostly. I sometimes forget why I like ksh so much. The standards don't force a subshell to be created in this case, but they do allow a subshell to be used in this case. So, the script I suggested will work with a Korn shell and some other shells, but it will not work with some other shells (including bash ). To make it portable to any POSIX-conforming shell, the easy way would be to go back to using a temp file:

#!/bin/ksh
sub_total=0
grand_total=0
prev_desc=
sort file > tmp.$$
while IFS="=" read desc amount
do	if [ "$desc" != "$prev_desc" ]
	then	printf '%s=%d\n' "$prev_desc" "$sub_total"
		grand_total=$((grand_total + sub_total))
		sub_total=0
		prev_desc=$desc
	fi
	sub_total=$((sub_total + amount))
done < tmp.$$
rm -f tmp.$$
printf '%s=%d\n' "$prev_desc" "$sub_total"
grand_total=$((grand_total + sub_total))
printf 'Grand Total=%d\n' "$grand_total"