Add all 4 column entries for similar column ids

Hi,

I want to write a script which will add the entries in all columns for the same column id. I can do it in excel, but I need to do this for 384 columns which will come down to 96 (384/4). How can I do this iteratively

A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3

output

A B C
2 12 18
4 208 190
26 8 16

Thanks,

An awk approach:

awk '
        NR == 1 {
                for ( i = 1; i <= NF; i++ )
                {
                        A = $i
                        H[$i]
                }
                for ( k in H )
                        printf "%s ", k
                printf "\n"
        }
        NR > 1 {
                for ( i = 1; i <= NF; i++ )
                        R[A] += $i
                for ( k in R )
                {
                        printf "%s ", R[k]
                        R[k] = 0
                }
                printf "\n"
        }

' file
1 Like

Here is a solution in awk:

awk 'NR==1 {
   for(i=1;i<=NF;i++) C[$i]=C[$i]" "i
   for(col in C) printf col" "
   printf "\n"
   next
}
{for(col in C) {
   T=0
   for(i=split(C[col],v);i;i--) T+=$v
   printf T" "
 }
 printf "\n"
} '

Thanks Yoda. It worked but the output is not tab delimited and also it changed the columns order. Can I get it in tab delimited format?

---------- Post updated at 12:51 PM ---------- Previous update was at 12:50 PM ----------

Thanks Chubler_XL. I tried running it using sh script.sh file.txt >out.txt. Its taking forever to complete.

I'm a little concerned with your solution Yoda.

Is it guaranteed that the totals printed are going to match the headings, i.e. will for (k in H) be in the same order as for (k in R) ?

Also consider the following input:

A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3
5 6 7 8

---------- Post updated at 07:01 AM ---------- Previous update was at 06:53 AM ----------

Reason my original solution took so long was I forgot to specify the input file!

This adjustment keeps same column order as original file

awk 'NR==1 {
   for(i=1;i<=NF;i++) {
      G=$i
      if (!($i in C)) {
          C[$i];
          H[++cols]=$i
      }
   }
   for(i=1;i<=cols;i++) printf "%s\t", H
   printf "\n"
   next
}
{
 for(i=1;i<=NF;i++) T[G]+=$i
 for(i=1;i<=cols;i++) {
     printf "%s\t",0+T[H]
     T=0
 }
 printf "\n"
} ' infile
1 Like

Thanks Chubler_XL. It worked as I wanted.(did not change the order and the output is tab delimited). thanks again.

With Perl:

perl -lane'
  $" = "\t";
  do {
    @f = @F;
    $t{$_}++ or push @o, $_ for @F;
    print "@o";
    next 
    } if $. == 1;
  $s{$f[$_]} += $F[$_] for 0..$#F;
  print "@{[map $s{$_}, @o]}"; 
  %s = ();
  ' infile

@ Chubler_XL : your code in
#5 I am getting following output

[akshay@aix tmp]$ cat 1
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3

[akshay@aix tmp]$ awk 'NR==1 {
   for(i=1;i<=NF;i++) {
      G=$i
      if (!($i in C)) {
          C[$i];
          H[++cols]=$i
      }
   }
   for(i=1;i<=cols;i++) printf "%s\t", H
   printf "\n"
   next
}
{
 for(i=1;i<=NF;i++) T[G]+=$i
 for(i=1;i<=cols;i++) {
     printf "%s\t",0+T[H]
     T=0
 }
 printf "\n"
} ' 1

A    B    C    
2    12    18    
6    190    208    
32    198    224    

Will try to reduce number of loop here....

@Diya123 : Try below code

 awk 'NR==1{
             for(i=1;i<=NF;i++){
                                          A[$i]++
                                          if(A[$i]==1){
                                                        B=$i
                                                        header = header ? header OFS $i : $i
                                                      }
                               }
             print header
           }
       NR>1{
             for(i=1;i<=NF;i++){
                                             tmp += $i
                                             if(B[i+1]){
                                                              printf tmp OFS;tmp=""
                                                       }
                                }
              printf tmp RS
              tmp = 0
          }
        ' OFS=\\t file
A    B    C
2    12    18
4    178    190
26    8    16