print unique values of a column and sum up the corresponding values in next column

Hi All,
I have a file which is having 3 columns as (string string integer)
a b 1
x y 2
p k 5
y y 4
.....
.....

Question:
I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the output as :
b 1
k 5
y 6

please help.
-amigarus

gawk '{
    a[$2]=a[$2]+$3
}END{
    # for(i in a) print i,a | "sort"
    b=asorti(a,c)
    for(o=1;o<=b;o++){
        print c[o],a[c[o]]
    }
}' file

thanks ichigo.
can you explain the programme?
Is there a way it can be done without using awk or may be without sed... just a thought...

Yes, you can use Perl -

$
$ cat f3
a b 1
x y 2
p k 5
y y 4
$
$ ##
$ sort -k2,2 f3 |
> perl -F'\s+' -lane 'if ($x ne $F[1] && $x ne "") {print "$x $y"; $y=0};
>                     $x=$F[1]; $y+=$F[2]; END {print "$x $y"}'
b 1
k 5
y 6
$
$

tyler_durden

Can anyone please explain this code. being new to unix I'm not getting this.
Thanks

~/unix.com$ cat file
x y 2
a b 1
p k 5
y y 4
b a 9
c b 2
~/unix.com$ awk '{a[$2]+=$3}END{for (i in a) print i,a}' file
a 9
b 3
k 5
y 6

This script is the same as above but lacks the sorting function, as awk seems to sort things itself...
the first {...} block creates an array a indexed with second parameter $2, adding $3.
From my file above, when 'b' on the $2nd column is encountered the first time, you end up with a['b']=1, then the second time (on the last line), a['b']+=2 (a['b'] is already 1, plus 2 = 3). The same goes for every $2nd element.
the second {...} block takes each index from the a array, store them in i, then displays i and the corresponding a['i']
Is it clear enough? :o

1 Like

yeah, its very much clear.

Thanks a lot, Tukuyomi.