I would like to extract the average of each group (column 2),
then display the name of the fruit (column 3) that is closest to the group means.
I tried to simplify the problem.
group.txt
#Group Value Fruit
1 8 Orange
1 6.5 Banana
1 6.2 Apple
1 12 Apricot
1 7 Blackberry
2 4 Apple
2 6 Banana
2 6 Apricot
2 3 Blackberry
(8 + 6 + 6 + 12 +7) / 5 = 7.94
The fruit closest to the mean is Orange for group 1.
(4 + 6 + 6 + 3) / 4 = 4.75
The fruit closest to the mean is Apple for group 2.
I'm scripting a bit in bash but now I don't know where to start. Does anyone have an idea? ' -_-
welcome to the community, @Thomthom !
Could you start by mentioning your OS, pls.
Also, if you have gawk installed, pls mention its version: gawk --version.
I'd start with gawk as it has the associative array capability builtin and also has the basic math functions that you might need.
Start with calculating avg for each group.
Take the following as a starting point for calculating avg for each group for your sample file: awk -f thom.awk myInputFile where thom.awk is:
FNR > 1 && NF {
groupSum[$1]+=$2
groupCnt[$1]++
}
END {
for( i in groupSum)
printf("%s [%.2f]\n", i,groupSum[i]/groupCnt[i])
}
That's right. You'll need to add another array (possibly) indexed by a group and a fruit with the value of "Value" for each cell. And then find "the closest" to the avg.
You can substract "value" from the "avg" and find the "absolute" (as it can be negative) closest.
Look at the sample starting point code I've provided and try to enhance it based on the above algorithm.
Let us know how it goes and where/if you get stuck.
Once again: Is this a homework?
No further assistance will be provided unless the above is clarified.
" You'll need to add another array (possibly) indexed by a group and a fruit with the value of "Value" for each cell. And then find "the closest" to the avg. "