Avoiding For Loop in Shell Script

hydrabane · August 18, 2011, 8:45pm

I am looking to a solution to the following problem. I have a very large file that looks something like this:

Each group of three numbers on each line are three probabilities that sum to one.

I want to output the maximum for each group of three. So desired output would be:

or

I've written the following kludge which works just fine on a small subset of this data but does not scale up well (because it involves an ugly triple nested for loop):

#!/bin/sh
length=$(awk '{n++}END{print n}' file )
width=$(awk 'NR == 1 { print NF }' file )
width=$( expr $width / 3 )
 
for ((i=1; i <= length; i++))
do
for ((j=1; j <= width; j++))
do
a=$(expr 1 + \( \( $j - 1 \) \* 3 \) )
b=$(expr $a + 2)
awk -v i=$i -v a=$a -v b=$b 'NR == i { for (k = a; k <= b; k++ )if ( $k > max) max = $k } END { print max}' file >> out
done
done

Does anyone know of a solution that will scale up well? Can it be accomplished with basic unix utilities/shell scripting?

Thanks in advance!

Corona688 · August 18, 2011, 9:23pm

It's not the ugly triply-nested for-loop that's making it slow, it's the running of multiple external processes per loop. It's wasteful to run awk, grep, sed, and so forth for individual lines -- they are efficient when run on batches of data, but take time to load and quit. Imagine being only allowed to say one word per telephone call... Or in your case, having to say 10,000 words per phone call but only one of them with any meaning.

You're also using externals like expr when your shell (probably) supports better ways of doing math. expr is inefficient for the reason listed above but some shells have nothing better.

How one could do this more efficiently depends on what utilities are available and what system you have. what are they?

Chubler_XL · August 18, 2011, 9:24pm

Is this what you were after:

awk '
   { for(i=1;i<=NF;i++) {
      max=$i>max?$i:max;
      if(!(++cnt%3)) {
          print max;
          max=-999;
   }}}
   END { if(cnt%3) print max }' file

@Corona688 - One word per phone call - Nice analogy, I'll file that one away for later use

yazu · August 18, 2011, 9:32pm

Another awk:

awk '                                                                        
function max (a, b) {    
  return a>b ? a : b;   
}                     
{ for (i=1; i<=NF; i+=3) {
    printf max( max($i, $(i+1)), $(i+2)) " "
  }   
  print ""                               
}
' INPUTFILE

alister · August 18, 2011, 9:40pm

Yet another:

tr -s ' \t' '\n\n' < file | awk '$0>max+0 {max=$0} !(NR%3) {print max; max=0}' > out

Test run:

$ cat data
0.111 0.111 0.788 0.101 0.800 0.099 0.500 0.255 0.245
0.234 0.675 0.091 0.100 0.088 0.812 0 0 1
$ tr -s ' \t' '\n\n' < data | awk '$0>max+0 {max=$0} !(NR%3) {print max; max=0}'
0.788
0.800
0.500
0.675
0.812
1

Regards,
Alister

Chubler_XL · August 18, 2011, 9:43pm

My solution will support lines with number of field not evenly divisible by 3 - handy if that situation can occur.

$ cat file
0.111 0.111 0.788 0.101 0.800 0.099 0.500 0.255 0.245 
0.234 0.675 0.091 0.100 0.088 0.812 0 0 1 7

hydrabane · August 19, 2011, 12:08pm

Thanks Everyone! I like that one word per phone call analogy! I now understand the problem much better.