Sum of three columns - in 4N columns file

f_o_555 · January 6, 2010, 8:02am

Hi All,
happy new year.

I have a file with 4xN columns like

0.0000e+00	0.0000e+00	7.199E+07	7.123E+07	6.976E+07	6.482E+07	5.256E+07	2.523E+07
0.0000e+00	0.0000e+00	8.641E+07	8.550E+07	8.373E+07	7.780E+07	6.309E+07	3.028E+07
0.0000e+00	0.0000e+00	1.017E+08	1.007E+08	9.857E+07	9.159E+07	7.427E+07	3.565E+07
0.0000e+00	0.0000e+00	1.178E+08	1.166E+08	1.142E+08	1.061E+08	8.603E+07	4.130E+07
0.0000e+00	0.0000e+00	1.347E+08	1.333E+08	1.305E+08	1.213E+08	9.832E+07	4.719E+07
0.0000e+00	0.0000e+00	1.521E+08	1.505E+08	1.474E+08	1.370E+08	1.111E+08	5.331E+07

In this case N=2, but it can be what ever number.
I would like to replace
the 0's in column 1 with the sum of the column N+1,2N+1,3N+1
the 0's in column 2 with the sum of the column N+2,2N+2,3N+2
...
the 0's in column N with the sum of the column N+N,2N+N,3N+N

At the moment I'm using the following script which works fine if N=1

awk -F'\t' '
  (/^[0-9]/ && $1 == 0 && $1 = sprintf("%.4E",$2 + $3 + $4)) || -3
  ' OFS='\t' filename

Any idea how to genaralize the code?
Thank you,
Sarah

ahmad.diab · January 6, 2010, 11:07am

Try this and let me know it is what you want.use nawk or gawk
or /usr/xpg4/bin/awk

from your description above 4*N=number of columns (NF always even number) so N=NF/4

nawk '{ for (i=1;i<=NF;i++) { $i=$((NF/4)+i) + $(2*(NF/4)+i) + $(3*(NF/4)+i)printf "%.4E\t",$i}
print "" }'  infile.txt

:D:D

alister · January 6, 2010, 3:01pm

Hi, f_o_555:

I did my best to not make any assumptions regarding your assumptions (I was somewhat puzzled by the choice of negative three to trigger the printing of a record, but I kept it ;))

$ cat f_o_555.awk 
BEGIN { OFS="\t" }
/^[0-9]/ && (N=NF/4) && (i=NF+1) {
    delete sum
    while (--i)
        i>N ? (sum[i%N ? i%N : N]+=$i) : ($i=sprintf("%.4E", sum))
}
-3

$ cat data3
0 0 0 1 2 3 1 2 3 1 2 3
0 0 0 1 2 3 1 2 3 1 2 3
0 0 0 1 2 3 1 2 3 1 2 3

$ awk -f f_o_555.awk data3
3.0000E+00      6.0000E+00      9.0000E+00      1       2       3       1      2       3       1       2       3
3.0000E+00      6.0000E+00      9.0000E+00      1       2       3       1      2       3       1       2       3
3.0000E+00      6.0000E+00      9.0000E+00      1       2       3       1      2       3       1       2       3

$ cat data4
0 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4
0 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4
0 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4

$ awk -f f_o_555.awk data4
3.0000E+00      6.0000E+00      9.0000E+00      1.2000E+01      1       2      3      4       1       2       3       4       1       2       3       4
3.0000E+00      6.0000E+00      9.0000E+00      1.2000E+01      1       2      3      4       1       2       3       4       1       2       3       4
3.0000E+00      6.0000E+00      9.0000E+00      1.2000E+01      1       2      3      4       1       2       3       4       1       2       3       4

Regards,
alister

---------- Post updated at 03:01 PM ---------- Previous update was at 12:29 PM ----------

As i understand the problem, ahmad.diab's solution is incorrect but it did make me realize that i was over-engineering. A simpler approach inspired by his post (the only advantage of the modulus-based solution above is that it can readily handle MxN, and not just 4xN, if the hardcoded 4 is parameterized).

The following gives the same output given the same sample data files (data3 and data4) used above.

$ awk '/^[0-9]/ && $1==0 && (N=NF/4) { for (i=1;i<=N;i++) $i=sprintf("%.4E", $(N+i)+$(2*N+i)+$(3*N+i))} -3' OFS='\t' data

alister

rdcwayx · January 6, 2010, 4:26pm

What's the meaning of " -3 " ?

alister · January 6, 2010, 6:05pm

I don't know. heheh. In the original post's code, it triggers printing of a record that doesn't match the criteria which seem to identify records with data to process. I couldn't part with it.

Technically, any pattern expression that evaluates to non-zero/true will cause its corresponding action to execute. -3 evaluates to true and its corresponding action is absent and so it default to printing $0.

ahmad.diab · January 7, 2010, 2:57am

alister:

Hi, f_o_555:

As i understand the problem, ahmad.diab's solution is incorrect but it did make me realize that i was over-engineering. A simpler approach inspired by his post (the only advantage of the modulus-based solution above is that it can readily handle MxN, and not just 4xN, if the hardcoded 4 is parameterized).

The following gives the same output given the same sample data files (data3 and data4) used above.
$ awk '/^[0-9]/ && $1==0 && (N=NF/4) { for (i=1;i<=N;i++) $i=sprintf("%.4E", $(N+i)+$(2*N+i)+$(3*N+i))} -3' OFS='\t' data
alister

the number of columns is 4*N not only "N" as in alister code above and as f_o_555 mention in his post below.

So we need f_o_555 to decide what is the number of columns need to be proceed in the for loop ? is it N or 4*N (NF)

BR
:):)

alister · January 7, 2010, 3:18am

Hello ahmad.diab:

The number of columns is 4*N. The problem statement says that they are divided into four groups. The only columns that will be modified to contain the sum of their counterparts are the first N. Your code is going beyond the first N columns and modifying ones it shouldn't.

Take care,
alister

f_o_555 · January 7, 2010, 4:33am

Thank you for all the reply.
I got from Radoulov the idea of using awk with -3 option although I didn't know what -3 meant. So any other solution is welcome.
I tried the solution

awk '/^[0-9]/ && $1==0 && (N=NF/4) { for (i=1;i<=N;i++) $i=sprintf("%.4E", $(N+i)+$(2*N+i)+$(3*N+i))} -3' OFS='\t' data

but I get as the output the input itself.

I tried also the first solution of alister which he showed to be working but I still get as the output the input itself.

Maybe the =0 is not detected with floating point?

---------- Post updated at 04:33 AM ---------- Previous update was at 04:16 AM ----------

Sorry there was a mistake in the input file (1 extra tab before the first column). now it's fixed.
thank you again,
sarah

ahmad.diab · January 7, 2010, 6:54am

Thanks alister for the clarification but as you can see f_o_555 post was not
very clear to me, and as per my code was modified and used by you it means
that I was on the correct track and then misunderstand the post.

Thanks again man

:D:D:D:D