Mean and Standard deviation

Hi all,
I am new to shell scripting and wanna calculate the mean and standard deviation using shell programming.

I have a file with letters that are repeating and their corresponding duration
a 0.32
a 0.89
aa 0.34
aa 0.23
au 0.012
au 0.26
SIL 0.34
ai 0.9
b 0.29
bh 0.19
ssil 0.87

I want to calculate the mean and standard deviation for each letter. I am able to calculate for single letter, but cant do for whole date at a time.

grep -e '^a' file | grep -v 'aa'|grep -v 'ai'|grep -v 'au' > test.file
awk '{sum+=$2}END{print "Mean = ",sum/NR}' test.file

Thanks in advance.

Check out this thread... does that help?

Maybe the problem is in the regexp:

You obviously want to match "a", but not "aa" or "ai", etc., right? As your file has spaces delimiting the first field use these to limit your matched lines to only the wanted ones:

grep '^a ' file | ...

Notice the space character behind the "a" - this will match only the lines starting with "a" but not these starting with "a<something>".

I hope this helps.

bakunin

those are not seperated by spaces, but with tabs like two columns
like seperating a from ai|au|aa , similarly their are b from bh
can u please help

Then replace the space by a tab character in my solution. It is possible to search for any character, you just have to take care that the shell doesn't devour the more fancy characters with a special meaning to it. This is what the single quotes around the regexp are for.

Replace in the following examples the "<tab>" with a literal tab character, i just write it that way to make it readable:

While this would not work as expected, because the shell would take the tab char:

echo "abc<tab>def" | grep c<tab>d

The following would work indeed:

echo "abc<tab>def" | grep 'c<tab>d'

Notice the difference: the quotation marks around the regexp.

I hope this helps.

bakunin