Calculating Running Variance Using Awk

Hi all,

I am attempting to calculate a running variance for a file containing a column of numbers. I am using the formula variance=sum((x-mean(x))^2)/(n-1) , where x is the value on the current row, and mean(x) is the average of all of the values up until that row. n represents the total number of rows up until the current row.

For example, given a column of three numbers:

100
100
-50

The variance should be:

0
0
7500

Because when we get to row three, mean(x) = (100+100+(-50))/3 = 50 , and the variance would therefore be:

variance= ((100-50)^2 + (100-50)^2 + (-50-50)^2)/(3-1) = (50^2 + 50^2 + 100^2) / 2 = 15000/2 = 7500

My question is, how do I do this with awk to generate a running total of the variance per line? I am using awk to perform several other mathematical operations on my data, so I would prefer to use it for this operation as well; however, if there is a more appropriate tool for doing this, I would like to hear about it.

Thanks,

-Jahn

It has to recalculate the entire thing every single line, so it's not a matter of which 'tool' you use, it's just storing the data and doing the work...

awk '{ D[NR]=$0; T+=$0 }
        NR==1 { print 0 ; next }
        {        V=0
                  A=T/NR
                  for(N=1; N<=NR; N++) V+=(D[N]-A)*(D[N]-A)
                  V/=(NR-1)
                  $0 = V } 1' data