Sum of numbers in three or more files

Natalie · October 9, 2013, 10:14pm

I have files :
cat file1

cat file2

445 66 77 3  56

jethrow · October 9, 2013, 10:51pm

awk '{for(i=1;i<=NF;i++)t+=$i} END {print t}' file1 file2 file3

Natalie · October 9, 2013, 10:54pm

is it possible to do it without awk?? Just to add smth to my my code?

jethrow · October 9, 2013, 11:00pm

Not a fan of awk?

for f in file1 file2 file3; do
	while read line || [ -n "$line" ]; do
		for num in $line; do
			(( t += num ))
		done
	done < $f
done
echo $t

Natalie · October 9, 2013, 11:04pm

Lol yes not fan what about echo 5 6 | sh my_code?

jethrow · October 9, 2013, 11:42pm

script.ksh
for num in `cat $*`;do((t+=num));done;echo $t
... or ...
awk '{for(i=1;i<=NF;i++)t+=$i} END {print t}' $*

echo 5 6 | sh script.ksh

Natalie · October 10, 2013, 12:04am

Thank you:)

Jotne · October 10, 2013, 2:30am

I try to avoid loops in awk to speed things up. If you have gnu awk , you can do this:

awk '{a+=$1} END {print a}' RS=" |\n" file?
1779

If you like to store this into a variable do this:

var=$(awk '{a+=$1} END {print a}' RS=" |\n" file?)

PS, if you have awk on your system, why not use it?

Natalie · October 10, 2013, 2:33am

jotne:

I try to avoid loops in awk to speed things up. If you have gnu awk , you can do this:
awk '{a+=$1} END {print a}' RS=" |\n" file?
1779
PS, if you have awk on your system, why not use it?

I dont have awk and I am not fan of awk..Even now I dont understand that equation u wrote to me... prefer loops and statements...More clear for me..

Jotne · October 10, 2013, 2:43am

RS=" |\n" make a data in the file come out in separate lines, like
1 2 3 changes to

1
2
3

a+=$1 add all lines to variable a
print a prints the variable a
file? represent any file from file1 to file9

What system are you on?

MadeInGermany · October 10, 2013, 4:03am

See my post in this forum Finding an average
At the end it divides the sum to find the average.

alister · October 10, 2013, 10:33am

While there may be combinations of AWK implementation and operating system on which your suggestion is faster, I compared it against its predecessor on two combinations and yours was slower everytime.

$ seq 1000000 | paste - - - - - - - - - - > data

$ wc data
 100000 1000000 6888896 data

$ head -n5 data
1       2       3       4       5       6       7       8       9       10
11      12      13      14      15      16      17      18      19      20
21      22      23      24      25      26      27      28      29      30
31      32      33      34      35      36      37      38      39      40
41      42      43      44      45      46      47      48      49      50

$ tail -n5 data
999951  999952  999953  999954  999955  999956  999957  999958  999959  999960
999961  999962  999963  999964  999965  999966  999967  999968  999969  999970
999971  999972  999973  999974  999975  999976  999977  999978  999979  999980
999981  999982  999983  999984  999985  999986  999987  999988  999989  999990
999991  999992  999993  999994  999995  999996  999997  999998  999999  1000000

For each of the following results, the best of 5 runs was chosen.

Cygwin/GAWK 4.1.0:

$ time gawk '{for(i=1;i<=NF;i++)t+=$i} END {print t}' data
500000500000

real    0m1.359s
user    0m1.327s
sys     0m0.015s

$ time gawk '{a+=$1} END {print a}' RS=' |\t|\n' data
500000500000

real    0m2.797s
user    0m2.796s
sys     0m0.030s

Linux/MAWK 1.3.3:

$ time mawk '{for(i=1;i<=NF;i++)t+=$i} END {print t}' data
5e+11

real    0m0.753s
user    0m0.640s
sys     0m0.032s

$ time mawk '{a+=$1} END {print a}' RS=' |\t|\n' data
5e+11

real    0m1.346s
user    0m1.268s
sys     0m0.012s

In my opinion, unless there is a confirmed performance issue and unless the AWK implementation is known, unqualified AWK optimization tips are usually a bad idea (doubly so when advising a novice who is more likely to blindly internalize the advice).

Different awk implementations, and even different versions of the same implementation, implement differing sets of optimization strategies. One example I ran into recently: gawk lazily recomputes $0. As you probably know, POSIX requires recomputing $0 whenever a field is modified. gawk will not perform that recomputation until $0 is referenced (if at all). That optimization in effect:

$ time gawk '{for (i=1;i<=NF;i++) $i=""}' data

real    0m0.594s
user    0m0.593s
sys     0m0.030s

$ time mawk '{for (i=1;i<=NF;i++) $i=""}' data

real    0m1.039s
user    0m0.900s
sys     0m0.060s

Even though it is MAWK who has the speedy reputation, this version of GAWK is much faster because it doesn't recompute $0 after each $i="" (since $0 is never referenced after a field modification, it is never recomputed).

Regards,
Alister

Jotne · October 10, 2013, 11:20am

This was very interesting, and an eye opener. I have never tested this, just thought i many be solver to run ting in loop. This prove it many be wrong.
Thanks for taking time to test.