BASH. Need to extract some numbers and take the average

slackjack · November 1, 2009, 9:15pm

Hey all,

I ran some simulations, of which the output is 100s of files. I've used grep to extract the vital information needed from the files. This has made my task somewhat easier. But I still need to perform some mathematical calculations (average and geometrical average) on the results of the grep command (which is saved to a txt file).

The general format of the output of grep is (just a crude example):

a-4.txt number of eggs used is ## in #  cartons
b-4.txt number of eggs used is ## in #  cartons
b-6.txt number of eggs used is ## in #  cartons
a-6.txt number of eggs used is ## in #  cartons
a-8.txt number of eggs used is ## in #  cartons
b-8.txt number of eggs used is ## in #  cartons
a-10.txt number of eggs used is ## in #  cartons
b-10.txt number of eggs used is ## in #  cartons

My aim here is to get the average number of cartons (#) needed to pack ## number of eggs. Here I want to average all the a-4, a-6, a-8, and a-10 and then all the b-4, b-6, b-8, and b-10 cartons. SO my end result is to have an average number of cartons for both a and b. As you can imagine doing this by hand with results that can go up to 100 is very time consuming. Can someone please assist me with a BASH script that can make my job simpler?

Thank you.

daptal · November 1, 2009, 10:38pm

 cat abc.txt | perl -e 'while ( <>){ push @{$hash{$1}},$2 if (/^([a-z]+)\-(\d+)\..*/);} foreach $k(keys %hash){ my $total = 0; ($total+=$_) for @{$hash{$k}}; printf ("%s\t%d\t%.2f\n",$k,$total,$total/($#{$hash{$k}}+1));}'

Replace abc.txt with your file name

HTH,
PL

slackjack · November 1, 2009, 10:57pm

I dont understand perl So I cant make modifications to the script. Could you provide some comments or a translation to BASH? I would ideally like to adapt this to many situations.

Scrutinizer · November 1, 2009, 11:36pm

Something like this? (bash code):

grep cartons [ab]-* |
{ while read x x x x x eggs x cartons x; do
    (( total_eggs+=eggs ))
    (( total_cartons+=cartons ))
  done
  echo "A total number of $total_eggs eggs in $total_cartons cartons makes an average of $(( total_eggs/total_cartons )) eggs per carton"
}

$> grep cartons [ab]-*
a-10.txt:number of eggs used is 55 in 4 cartons
a-4.txt:number of eggs used is 97 in 3 cartons
a-6.txt:number of eggs used is 22 in 1 cartons
a-8.txt:number of eggs used is 44 in 9 cartons
b-10.txt:number of eggs used is 87 in 5 cartons
b-4.txt:number of eggs used is 85 in 5 cartons
b-6.txt:number of eggs used is 36 in 6 cartons
b-8.txt:number of eggs used is 88 in 2 cartons

Result:

A total number of 514 eggs in 35 cartons makes an average of 14 eggs per carton

ghostdog74 · November 1, 2009, 11:44pm

if you want to use Perl, use Perl. There's no need to cat the file. Pass it into perl as an argument. Also, don't cram everything into one line like that. Its hard to read and troubleshoot. Indent your code where necessary.

slackjack · November 2, 2009, 2:14pm

scrutinizer:

Something like this? (bash code):

grep cartons [ab]-* |
{ while read x x x x x eggs x cartons x; do
   (( total_eggs+=eggs ))
   (( total_cartons+=cartons ))
  done
  echo "A total number of $total_eggs eggs in $total_cartons cartons makes an average of $(( total_eggs/total_cartons )) eggs per carton"
}

$> grep cartons [ab]-*
a-10.txt:number of eggs used is 55 in 4 cartons
a-4.txt:number of eggs used is 97 in 3 cartons
a-6.txt:number of eggs used is 22 in 1 cartons
a-8.txt:number of eggs used is 44 in 9 cartons
b-10.txt:number of eggs used is 87 in 5 cartons
b-4.txt:number of eggs used is 85 in 5 cartons
b-6.txt:number of eggs used is 36 in 6 cartons
b-8.txt:number of eggs used is 88 in 2 cartons

Result:

A total number of 514 eggs in 35 cartons makes an average of 14 eggs per carton

Hi,

What exactly does the highlighted code do? Also, what if the structure of the string changes? Like if it was:
b-4.txt:85 eggs in 5 cartons

I would be grateful if you could explain how you extracted the numbers.

Scrutinizer · November 2, 2009, 4:38pm

Hi slackjack, the grep statement extracts those lines that contain the word 'cartons' from every file that is starts with 'a-' or with 'b-' The output is fed into the curly brace block (which is needed because in bash the fields will loose their values outside the while loop). The output is then fed into the read statement of which the first 5 fields and field 7 are discarded into a dummy variable called 'x'. Any further fields are all assigned to the last x.

If the line to look for in those files were "85 eggs in 5 cartons", the read statement would be:

read eggs x x cartons x

rdcwayx · November 3, 2009, 6:38pm

$ cat cartons
a-10.txt:number of eggs used is 55 in 4 cartons
a-4.txt:number of eggs used is 97 in 3 cartons
a-6.txt:number of eggs used is 22 in 1 cartons
a-8.txt:number of eggs used is 44 in 9 cartons
b-10.txt:number of eggs used is 87 in 5 cartons
b-4.txt:number of eggs used is 85 in 5 cartons
b-6.txt:number of eggs used is 36 in 6 cartons
b-8.txt:number of eggs used is 88 in 2 cartons

$ awk ' {if ($1~/^a/) {eggA+=$6;cartonsA+=$8}}; {if ($1~/^b/) {eggB+=$6;cartonsB+=$8}} END {print eggA, "eggs and " ,cartonsA, "cartons in a files, average is " eggA/cartonsA , "\n"  eggB,"eggs and ", cartonsB " cartons in b files, average is " eggB/cartonsB}' cartons
218 eggs and  17 cartons in a files, average is 12.8235
296 eggs and  18 cartons in b files, average is 16.4444

slackjack · November 4, 2009, 12:29am

scrutinizer:

Hi slackjack, the grep statement extracts those lines that contain the word 'cartons' from every file that is starts with 'a-' or with 'b-' The output is fed into the curly brace block (which is needed because in bash the fields will loose their values outside the while loop). The output is then fed into the read statement of which the first 5 fields and field 7 are discarded into a dummy variable called 'x'. Any further fields are all assigned to the last x.

If the line to look for in those files were "85 eggs in 5 cartons", the read statement would be:
read eggs x x cartons x 

One other quick question. I'm having some difficulty finding the geometric average. I need to use exponents for this, and the result may floating point numbers at times. So for the eggs in a-X, the geometric average would be:
(no. of eggs in a-4 * no. of eggs in a-8 * no. of eggs in a-10) ^ (1/3)

How can we adjust the bash script to now find the geometric average?

Thanks again.

---------- Post updated 11-04-09 at 12:29 AM ---------- Previous update was 11-03-09 at 11:18 PM ----------

rdcwayx:

$ cat cartons
a-10.txt:number of eggs used is 55 in 4 cartons
a-4.txt:number of eggs used is 97 in 3 cartons
a-6.txt:number of eggs used is 22 in 1 cartons
a-8.txt:number of eggs used is 44 in 9 cartons
b-10.txt:number of eggs used is 87 in 5 cartons
b-4.txt:number of eggs used is 85 in 5 cartons
b-6.txt:number of eggs used is 36 in 6 cartons
b-8.txt:number of eggs used is 88 in 2 cartons

$ awk ' {if ($1~/^a/) {eggA+=$6;cartonsA+=$8}}; {if ($1~/^b/) {eggB+=$6;cartonsB+=$8}} END {print eggA, "eggs and " ,cartonsA, "cartons in a files, average is " eggA/cartonsA , "\n"  eggB,"eggs and ", cartonsB " cartons in b files, average is " eggB/cartonsB}' cartons
218 eggs and  17 cartons in a files, average is 12.8235
296 eggs and  18 cartons in b files, average is 16.4444

Wow, awk looks like it will things easier with the arithmetic. Thanks for providing this.

Scrutinizer · November 4, 2009, 3:30am

Hi slackjack, bash only knows integer arithmetic. However we can use external program (bc) for this:

#!/bin/bash
grep -h cartons [ab]-* |
{ while read eggs x x cartons x; do
    (( total_eggs+=eggs ))
    (( total_cartons+=cartons ));
  done
  printf '%s %.2F %s\n' "A total number of $total_eggs eggs in $total_cartons cartons makes an average of" $(echo "$total_eggs/$total_cartons"|bc -l) "eggs per carton"
}

Or you can use ksh93 which IMHO is a better shell for scripting purposes than bash and faster too. If you do not have it on your system you can download it for free. In ksh the same script would be

#!/bin/ksh
grep -h cartons [ab]-*|
while read eggs x x cartons x; do
  (( total_eggs+=eggs ))
  (( total_cartons+=cartons ))
done
printf '%s %.2F %s\n' "A total number of $total_eggs eggs in $total_cartons cartons makes an average of" $(( total_eggs/$total_cartons.0 )) "eggs per carton"

In both examples the value ".2" determines the precision and the output is:

A total number of 514 eggs in 35 cartons makes an average of 14.69 eggs per carton