Aggregation of huge data

Ravichander · April 4, 2014, 3:27am

Hi Friends,

I have a file with sample amount data as follows:

 
 -89990.3456
8788798.990000128
55109787.20
-12455558989.90876

I need to exclude the '-' symbol in order to treat all values as an absolute one and then I need to sum up.The record count is around 1 million.

How can I perform this ?

Regards,
Ravichander

SriniShoo · April 4, 2014, 4:15am

awk '{sub("-",X,$0); sum += $0} END {print sum}' file

Ravichander · April 4, 2014, 6:44am

Thanks Srinishoo but I am getting exponential output like this:

 
2.38409e+13

Kindly help me on this !

Regards,
Ravichander

SriniShoo · April 4, 2014, 6:55am

awk '{sub("-",X,$0); sum += $0} END {printf "%f\n", sum}' file

Corona688 · April 4, 2014, 11:44am

The other thing to keep in mind is that awk uses floating point numbers. It's not infinite precision. If you print that number to 13 extra decimal places, most of those decimals will be meaningless garbage.

If you want a perfect sum, bc should do the job, if you convert the output into something it can use.

awk 'BEGIN { print "Z = 0;" } { sub(/-/, ""); print "Z += ",$1,";" } END { print "Z;" }' inputfile | bc

This prints "z = 0;" as the first line, then all lines afterwards as "z += number;" And the final line as "z;" to print the final sum.

Ravichander · April 7, 2014, 12:45am

Hi Corona !

Thanks for your guidance and I have used your code like this:

 
 
 awk 'BEGIN { print "Z = 0;" } { sub(/-/, ""); print "Z += ",$1,";" } END { print "Z;" }' asa.txt | bc

where asa.txt has data like:

but I am getting an error of:

 
syntax error on line 1 stdin

Need your help on this !

Regards,
Ravichander

Akshay_Hegde · April 7, 2014, 12:50am

Obiviously you receive error as we can see first line is blank

Try untested

awk 'BEGIN { print "Z = 0;" } NF{ sub(/-/, ""); print "Z += ",$1,";" } END { print "Z;" }' asa.txt | bc

Ravichander · April 7, 2014, 1:09am

Hi Akshay,

I even removed the blank and tried it - still facing the same issue. Also I copied few records to a new file (Say 5 lines) and even then - it is occuring !

Data:

Command used:

 awk 'BEGIN { print "Z = 0;" } { sub(/-/, ""); print "Z += ",$1,";" } END { print "Z;" }' test.txt

Output:

Z = 0;
Z +=  21000000 ;
Z +=  3000 ;
Z +=  3000 ;
Z +=  670500 ;
Z +=  2963700 ;
Z;

To my knowledge the above stated output, the value of z should be incremented na ?

Kindly advise me on the same.

Regards,
Ravichander

Don_Cragun · April 7, 2014, 5:42am

Three weeks ago I suggested the code:

awk -F'|' -v dqANDms='["-]' '
BEGIN {	f=156
	printf("s=0\n")
}
NR > 2 {gsub(dqANDms, "", $f)
	printf("s+=%s\n",  $f)
}
END {	printf("s\n")
}' file | bc

in another thread (Aggregation of Huge files) where you wanted to process the 156th field instead of the 1st field, wanted to strip out double quote characters if any were present, and had two header lines in your input that were to be ignored. You said that when your input file contained 7 million records, my code didn't work; but you weren't able to show any input that caused it to produce the wrong result. Instead of answering requests to show sample input that caused suggested scripts provided to you to fail, you started this new thread.

Simplifying that code for the data you've presented here yields:

awk '
BEGIN {	printf("s=0\n")
}
{	sub(/-/, "")
	printf("s+=%s\n", $1)
}
END {	printf("s\n")
}' test.txt | bc

which, with the sample input you provided in message #8 in this thread produces the output:

24640200

which still looks like the correct result to me. If this isn't the result you wanted, what were you expecting?

If it matters, the output from awk that the above script feeds into bc is:

s=0
s+=21000000
s+=3000
s+=3000
s+=670500
s+=2963700
s