Avoiding For Loop in Shell Script

I am looking to a solution to the following problem. I have a very large file that looks something like this:

Each group of three numbers on each line are three probabilities that sum to one.

I want to output the maximum for each group of three. So desired output would be:

or

I've written the following kludge which works just fine on a small subset of this data but does not scale up well (because it involves an ugly triple nested for loop):

#!/bin/sh
length=$(awk '{n++}END{print n}' file )
width=$(awk 'NR == 1 { print NF }' file )
width=$( expr $width / 3 )
 
for ((i=1; i <= length; i++))
do
for ((j=1; j <= width; j++))
do
a=$(expr 1 + \( \( $j - 1 \) \* 3 \) )
b=$(expr $a + 2)
awk -v i=$i -v a=$a -v b=$b 'NR == i { for (k = a; k <= b; k++ )if ( $k > max) max = $k } END { print max}' file >> out
done
done

Does anyone know of a solution that will scale up well? Can it be accomplished with basic unix utilities/shell scripting?

Thanks in advance!

It's not the ugly triply-nested for-loop that's making it slow, it's the running of multiple external processes per loop. It's wasteful to run awk, grep, sed, and so forth for individual lines -- they are efficient when run on batches of data, but take time to load and quit. Imagine being only allowed to say one word per telephone call... Or in your case, having to say 10,000 words per phone call but only one of them with any meaning.

You're also using externals like expr when your shell (probably) supports better ways of doing math. expr is inefficient for the reason listed above but some shells have nothing better.

How one could do this more efficiently depends on what utilities are available and what system you have. what are they?

2 Likes

Is this what you were after:

awk '
   { for(i=1;i<=NF;i++) {
      max=$i>max?$i:max;
      if(!(++cnt%3)) {
          print max;
          max=-999;
   }}}
   END { if(cnt%3) print max }' file

@Corona688 - One word per phone call - Nice analogy, I'll file that one away for later use

1 Like

Another awk:

awk '                                                                        
function max (a, b) {    
  return a>b ? a : b;   
}                     
{ for (i=1; i<=NF; i+=3) {
    printf max( max($i, $(i+1)), $(i+2)) " "
  }   
  print ""                               
}
' INPUTFILE
1 Like

Yet another:

tr -s ' \t' '\n\n' < file | awk '$0>max+0 {max=$0} !(NR%3) {print max; max=0}' > out

Test run:

$ cat data
0.111 0.111 0.788 0.101 0.800 0.099 0.500 0.255 0.245
0.234 0.675 0.091 0.100 0.088 0.812 0 0 1
$ tr -s ' \t' '\n\n' < data | awk '$0>max+0 {max=$0} !(NR%3) {print max; max=0}'
0.788
0.800
0.500
0.675
0.812
1

Regards,
Alister

1 Like

My solution will support lines with number of field not evenly divisible by 3 - handy if that situation can occur.

$ cat file
0.111 0.111 0.788 0.101 0.800 0.099 0.500 0.255 0.245 
0.234 0.675 0.091 0.100 0.088 0.812 0 0 1 7

Thanks Everyone! I like that one word per phone call analogy! I now understand the problem much better.