Hi all,
I am attempting to calculate a running variance for a file containing a column of numbers. I am using the formula variance=sum((x-mean(x))^2)/(n-1)
, where x is the value on the current row, and mean(x) is the average of all of the values up until that row. n represents the total number of rows up until the current row.
For example, given a column of three numbers:
100
100
-50
The variance should be:
0
0
7500
Because when we get to row three, mean(x) = (100+100+(-50))/3 = 50
, and the variance would therefore be:
variance= ((100-50)^2 + (100-50)^2 + (-50-50)^2)/(3-1) = (50^2 + 50^2 + 100^2) / 2 = 15000/2 = 7500
My question is, how do I do this with awk to generate a running total of the variance per line? I am using awk to perform several other mathematical operations on my data, so I would prefer to use it for this operation as well; however, if there is a more appropriate tool for doing this, I would like to hear about it.
Thanks,
-Jahn