Dear Friends,
I want to know if it is possible to perform addition as part of uniq command.
Input:
213.64.56.208 1
213.64.56.208 2
213.64.56.208 3
213.46.27.204 10
213.46.27.204 20
213.46.27.204 30
The above input should be converted as follows
213.64.56.208 6
213.46.27.204 60
The comparison should be done only with the first field (ip address). the summation of second field should be presented as the second field in the result. If this cannot be done with "uniq" command. Please let me know other ways to do the same.
Thanks in advance,
awk '{a[$1]+=$2;}END{for (i in a) print i, a;}' file
Guru.
1 Like
Thanks Guruprasad,
I would like to know if this is possible with any commands other than awk family. Please clarify.
It can be done with any programming language, e.g., perl, python, ruby, shell (while loop), sed, etc....
Why not awk? It can be done in nearly any language, of course, but:
1) awk is especially good and efficient for this particular problem
2) everyone has an awk of some sort
Doing the same problem in generic sh, for instance:
while read A B
do
if [ "$LAST" != "$A" ]
then
[ -z "$LAST" ] || echo "$LAST" "$TOTAL"
LAST="$A"
TOTAL="$B"
else
TOTAL=`expr $TOTAL + $B`
fi
done < datafile
[ -z "$LAST" ] || echo "$LAST" "$TOTAL"
...and it only works if the data's all in order. If it's not, you need to do
sort < datafile > /tmp/$$
while read A B
do
if [ "$LAST" != "$A" ]
then
[ -z "$LAST" ] || echo "$LAST" "$TOTAL"
LAST="$A"
TOTAL="$B"
else
TOTAL=`expr $TOTAL + $B`
fi
done < /tmp/$$
rm -f /tmp/$$
13 lines, n+1 extra processes, a sort operation, and a temp file for something awk handles natively and faster, in one compact line.
Would it help if the awk code were better explained?
awk '
# This block gets run once per line. $1 is the first column and $2 is the second.
# A += B acts like A = A + B.
# awk arrays act like perl hashes, so A["string"] is valid.
# So the line might do:
# a["213.64.56.208"] = a["213.64.56.208"] + 3
# ...and so keep a running total for each different value of column 1.
{a[$1]+=$2;}
# This block gets run only once, after EOF.
# "for (i in a)" loops through every array index in a.
# "print i, a;" prints the index, then a field separator(space by default),
# then the total held in the array.
END{for (i in a) print i, a;}' file
1 Like