Add all 4 column entries for similar column ids

Diya123 · December 11, 2013, 2:50pm

Hi,

I want to write a script which will add the entries in all columns for the same column id. I can do it in excel, but I need to do this for 384 columns which will come down to 96 (384/4). How can I do this iteratively

A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3

output

Thanks,

Yoda · December 11, 2013, 3:33pm

An awk approach:

awk '
        NR == 1 {
                for ( i = 1; i <= NF; i++ )
                {
                        A = $i
                        H[$i]
                }
                for ( k in H )
                        printf "%s ", k
                printf "\n"
        }
        NR > 1 {
                for ( i = 1; i <= NF; i++ )
                        R[A] += $i
                for ( k in R )
                {
                        printf "%s ", R[k]
                        R[k] = 0
                }
                printf "\n"
        }

' file

Chubler_XL · December 11, 2013, 3:43pm

Here is a solution in awk:

awk 'NR==1 {
   for(i=1;i<=NF;i++) C[$i]=C[$i]" "i
   for(col in C) printf col" "
   printf "\n"
   next
}
{for(col in C) {
   T=0
   for(i=split(C[col],v);i;i--) T+=$v
   printf T" "
 }
 printf "\n"
} '

Diya123 · December 11, 2013, 3:51pm

Thanks Yoda. It worked but the output is not tab delimited and also it changed the columns order. Can I get it in tab delimited format?

---------- Post updated at 12:51 PM ---------- Previous update was at 12:50 PM ----------

Thanks Chubler_XL. I tried running it using sh script.sh file.txt >out.txt. Its taking forever to complete.

Chubler_XL · December 11, 2013, 4:01pm

I'm a little concerned with your solution Yoda.

Is it guaranteed that the totals printed are going to match the headings, i.e. will for (k in H) be in the same order as for (k in R) ?

Also consider the following input:

A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3
5 6 7 8

---------- Post updated at 07:01 AM ---------- Previous update was at 06:53 AM ----------

Reason my original solution took so long was I forgot to specify the input file!

This adjustment keeps same column order as original file

awk 'NR==1 {
   for(i=1;i<=NF;i++) {
      G=$i
      if (!($i in C)) {
          C[$i];
          H[++cols]=$i
      }
   }
   for(i=1;i<=cols;i++) printf "%s\t", H
   printf "\n"
   next
}
{
 for(i=1;i<=NF;i++) T[G]+=$i
 for(i=1;i<=cols;i++) {
     printf "%s\t",0+T[H]
     T=0
 }
 printf "\n"
} ' infile

Diya123 · December 11, 2013, 4:08pm

Thanks Chubler_XL. It worked as I wanted.(did not change the order and the output is tab delimited). thanks again.

radoulov · December 11, 2013, 4:25pm

With Perl:

perl -lane'
  $" = "\t";
  do {
    @f = @F;
    $t{$_}++ or push @o, $_ for @F;
    print "@o";
    next 
    } if $. == 1;
  $s{$f[$_]} += $F[$_] for 0..$#F;
  print "@{[map $s{$_}, @o]}"; 
  %s = ();
  ' infile

Akshay_Hegde · December 11, 2013, 11:57pm

diya123:

Hi,

I want to write a script which will add the entries in all columns for the same column id. I can do it in excel, but I need to do this for 384 columns which will come down to 96 (384/4). How can I do this iteratively
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3
output
A B C
2 12 18
4 208 190
26 8 16
Thanks,

@ Chubler_XL : your code in
#5 I am getting following output

[akshay@aix tmp]$ cat 1
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5  5 3

[akshay@aix tmp]$ awk 'NR==1 {
   for(i=1;i<=NF;i++) {
      G=$i
      if (!($i in C)) {
          C[$i];
          H[++cols]=$i
      }
   }
   for(i=1;i<=cols;i++) printf "%s\t", H
   printf "\n"
   next
}
{
 for(i=1;i<=NF;i++) T[G]+=$i
 for(i=1;i<=cols;i++) {
     printf "%s\t",0+T[H]
     T=0
 }
 printf "\n"
} ' 1

A    B    C    
2    12    18    
6    190    208    
32    198    224

Will try to reduce number of loop here....

@Diya123 : Try below code

 awk 'NR==1{
             for(i=1;i<=NF;i++){
                                          A[$i]++
                                          if(A[$i]==1){
                                                        B=$i
                                                        header = header ? header OFS $i : $i
                                                      }
                               }
             print header
           }
       NR>1{
             for(i=1;i<=NF;i++){
                                             tmp += $i
                                             if(B[i+1]){
                                                              printf tmp OFS;tmp=""
                                                       }
                                }
              printf tmp RS
              tmp = 0
          }
        ' OFS=\\t file

A    B    C
2    12    18
4    178    190
26    8    16