Finding standard deviation for all columns in a data file

ks_reddy · April 23, 2012, 3:55am

Hi All,

I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns.
Calculating the Standard Deviation for a column

awk '{ lines=FNR; arr[lines]=$3; sum+=$3}      END{ avg=sum/lines      sum=0;      for(i=1; i<=lines; i++)       	{ v=arr-avg;       	  sum+= v*v       	}      printf("n=%d avg=%f  stddev=%f\n",             lines, avg, sqrt( sum/( lines - 1) ) ) } ' filename

Thanks a lot.
Sidda

ananthap · April 23, 2012, 5:08am

For

{ lines=FNR; arr[lines]=$3; sum+=$3}

substitute

{  
for x=1 to nf { 
     lines++ ;
     arr[lines]=$x; sum+=$x ;
   }


}

ie. Instead of forcing to use the third field, loop through the number of fields using the builtin variable NF. Note that lines is incremented for each value read in.

ks_reddy · April 23, 2012, 7:20am

Hi Anantha,

I am getting syntax error after I modified the original script as below( by substituting your suggested portion of code).
Please correct the code so that it will run without any errors.

awk '{for x=1 to nf \
{lines++ ;arr[lines]=$x; sum+=$x ;}} \
END{ avg=sum/lines; sum=0; \    
 for(i=1; i<=lines; i++)  \
     	{ v=arr-avg;  sum+= v*v }\
 printf("n=%d avg=%f  stddev=%f\n",lines, avg, sqrt( sum/( lines - 1) ) ) } ' filename

ananthap · April 23, 2012, 8:40pm

Actually I posted the code so that you could try it out yourself.

What did you try? Post some test data and we will see.

OK