AWK sample variance

I would like to calculate
1/n [sum (x-average)^2]

In awk, I wrote the following line for the sigma summation:

{ summ+=($1-average)^2 }

Full code:

BEGIN { Print "This script calculate error estimates"; sum=0 }
{ sum+=$1; n++ }
END { average = sum/n }
BEGIN { summ=0 }
{ summ+=($1-average)^2 }
END { print "error estimate:", "Avg:", average, "Samples:", n, summ, "Co:", summ/n, "Estimator:", summ/(n*n-n), "Error:", sqrt(2/(n-1))*(summ/(n*n-n)), "Variance estimate upper:", summ/(n*n-n)+sqrt(2/(n-1))*(summ/(n*n-n)), "Variance estimate lower:", summ/(n*n-n)-sqrt(2/(n-1))*(summ/(n*n-n)) }

This, does not seem to be working.

If you want to read a file 2 times :

awk '
FNR == NR {
first pass
next
}

{ second pass }
' file file

I guess it is working, but mayhap it does not do what you expect. You can have multiple BEGIN and/or multiple END patterns in awk (although maybe not all implementations), but they ARE executed at the begin or the end of the entire programme. So - if you need the single elements and the average, in parallel to calculating the avg, put the elemants into an array, and then, in the END action, do your summ calculation looping through the array, and then the rest of your calculations.

Yes, it works but doesn't do what I need it to do.

If I set things in an array, then I would want the following:

arr[($1-average)^2] 

Then I would need to sum all the elements in the array.
How might one do this?

for example :

awk '
 BEGIN {sum=0;summ=0}
{
  arr[NR]=$1
  sum+=$1
}
END {
  average=sum/NR

  for (i=1;i<=NR;i++) {
     summ+=(arr-average)^2
  }
  print summ
}
' file

please note in END part NR is the last line number, so I can use it for the average and the for loop.

1 Like

Question:

If I want to find the sum of the square value of all elements in an array,
would that be

summm+=arr[i^2]

The problem I am trying to find a variance, using var(X)=<X^2>-<X>^2
and I am getting a negative value (wrong, var>0 ALWAYS). So I suspect
my procedure is wrong.

arr [i]is the number of the i line. That mean if "i=4" arr[i^2] will return the 16th lines and not the square of the 4th line.

I suspected that.

Could I do

summm+=$1^2

You can test it before asking if it could work!