Hi forum members,
I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows:
cu,u3o8,au,ag
-9,20,-9,3.6
0.005,30,-9,-9
0.005,50,10,3.44
0.021,-9,8,3.35
The following code seems to do most of what I want
gawk -F"," 'NR==1;NR>1 {for (i=1;i<=4;i++){if($i>0)a+=$i}}END{ for (i=1;i<=4;i++){printf a/(NR-1)","};printf "\n"}' OFS=, input.csv > average.csv
However, the issue is i'm using the $i>o condition to filter out the -9 values and then dividing by NR to calculate the average, which gives the wrong answer. For the Cu column, ignoring -9 values, I should get an average of 0.0103 but instead I get 0.0077.
I tried using a counter as follows:
gawk -F"," 'NR==1;NR>1 {for (i=1;i<=4;i++){if($i>0)a+=$i}; ++count[$i]}END{ for (i=1;i<=4;i++){printf a/count[$i]","};printf "\n"}' OFS=, input.csv > average.csv
I can't seem to get it to work. Any help would be greatly appreciated.