Get the SUM of TWO columns SEPARATELY by doing GROUP BY on other columns

machomaddy · December 27, 2013, 5:46am

My File looks like:

"|" -> Field separator

A|B|C|100|1000
D|E|F|1|2
G|H|I|0|7
D|E|F|1|2
A|B|C|10|10000
G|H|I|0|7
A|B|C|1|100
D|E|F|1|2

I need to do a SUM on Col. 5 and Col.6 by grouping on Col 1,2 & 3

My expected output is:

A|B|C|111|11100
D|E|F|3|6
G|H|I|1|16

Can this be done using awk? I have done this kind of SUM for a single column but I'm stuck up when I'm doing SUM for two columns

balajesuri · December 27, 2013, 6:36am

awk -F '|' '{k = $1"|"$2"|"$3; x[k]+=$4; y[k]+=$5} END{for(k in x){print k"|"x[k]"|"y[k]}}' file

This is just to help you get started. For summing beyond 5 columns, use a loop appropriately.

Akshay_Hegde · December 27, 2013, 7:17am

machomaddy:

My File looks like:

"|" -> Field separator
A|B|C|100|1000
D|E|F|1|2
G|H|I|0|7
D|E|F|1|2
A|B|C|10|10000
G|H|I|0|7
A|B|C|1|100
D|E|F|1|2
I need to do a SUM on Col. 5 and Col.6 by grouping on Col 1,2 & 3

My expected output is:
A|B|C|111|11100
D|E|F|3|6
G|H|I|1|16
Can this be done using awk? I have done this kind of SUM for a single column but I'm stuck up when I'm doing SUM for two columns

If your real file contains many columns then try

awk -F"|" '{    
                 x=$1 FS $2 FS $3
                 for( i=column_start; i<=column_end; i++)
                 A[x FS i] = A[x FS i] ? A[x FS i] + $i : $i
           }
        END{
                 s = gsub( x, FS)
                 for( i in A ){
                                     x = ""; split(i,S,FS)
                                     for( j=1; j<=s; j++ ) 
                                     x = x ? x FS S[j] : S[j] 
                                     if( !U[x]++ )SS[++k]=x
                              }
             for(i=1;i<=k;i++){
                                     printf SS OFS
                                     for( j=column_start; j<=column_end; j++ )
                                     printf j < column_end ? A[SS FS j] OFS : A[SS FS j] RS  
                              }

           }' column_start="4" column_end="5" OFS="|"  file

A|B|C|111|11100
D|E|F|3|6
G|H|I|0|14

change column_start="x" and column_end="x" according to your need.