ksh - Need Help Reducing Nested Loops

Azrael · October 23, 2014, 1:16pm

Hello, I pulled out some old code from an unfinished project the other day and wanted to stream line it better. I know anything beyond a double loop is usually bad practice, and I came up with some logic for later that would no longer require the first loop in the following code that works:

#!/bin/ksh

for j in $(ls -l ./lists | awk '{ print $9 }');
   do
      while read line;
        do
          A=`echo $line | sed 's/.*://;s/-/ /g;'`;
          B=`echo $A | awk '{ print $1 }' | cut -d "." -f4`;
          C=`echo $A | awk '{ print $2 }' | cut -d "." -f4`;
          D=`echo $A | awk '{ print $1 }' | cut -d "." -f2-`;

          while [ $B -le $C ];
            do
              echo $D.$B >> first_list_parsed (( B++ ));
          done;
      done;

      sleep 1;
done < /path/to/lists/first_list

I've tried changing this to the following, but it writes blank output to the first_list_parsed file:

#!/bin/ksh

while read line;
     do
          A=`echo $line | sed 's/.*://;s/-/ /g;'`;
          B=`echo $A | awk '{ print $1 }' | cut -d "." -f4`;
          C=`echo $A | awk '{ print $2 }' | cut -d "." -f4`;
          D=`echo $A | awk '{ print $1 }' | cut -d "." -f2-`;

          while [ $B -le $C ];
            do
              echo $D.$B >> first_list_parsed (( B++ ));
          done;

done;< /path/to/lists/first_list

Any suggestions welcome

Corona688 · October 23, 2014, 1:21pm

This is bad practice in far worse ways than a triply nested loop. You are using awk and sed to process individual lines. They are full-fledged languages -- I get the impression this program could be rewritten as one awk call; so this is like running, loading, and quitting a web browser to read each individual word on a web page. Seven times per word. Monstrously inefficient, not how it's intended to be used at all. The use of awk to split columns on single, individual lines is especially troublesome, shell read and text substitution is fully capable of doing that.

ls -l | awk '{ print $9 }' is also dangerous and wrong. Why ask ls for extended information then throw it all away? This will also break apart filenames which contain spaces. for j in ./lists/* does the same thing far more safely and without using ls at all.

Except that you don't appear to be using the value of j anywhere even though you loop over it. What are you trying to do? That part of the loop looks like a complete no-op right now. The actual loop is over /path/to/lists/first_list and nothing else.

All things considered, three loops isn't so bad if written well. Please explain what this code is for and I'll help improve it. The contents of /path/to/lists/first_list would also be extremely helpful to figuring out why your current program isn't working.

Azrael · October 23, 2014, 1:41pm

As I mentioned I'm trying to remove the for loop with "j" and "ls -l | awk '{ print $9 }'". Originally I was going to loop through multiple files, but have decided there are better ways to do this.

The lists contain thousands of lines of many different ip ranges. Since the pf firewall does not handle ranges outside CIDR, I'm using sed to remove the text in lines like these that are not ips and drop the hyphen between the ips:

Marchex, Inc:8.20.104.0-8.20.107.255

Then I used the B variable to hold the last octet of the first ip and C to hold the last octet of the second ip. D holds the first three octets of the ip addresses. After that the last while loop writes the individual ip address to a file to be used for a table in pf.

I'm sure there are easier and better ways to do a lot of this. I just thought I'd focus on one thing and not bother anyone with totally overhauling this. However, I'm open to any and all suggestions. As this was my first Ksh, its very convoluted and not where I'd like it to be. Thanks

Corona688 · October 23, 2014, 2:02pm

Ah, so for 8.20.104.0-8.20.107.255 you wanted

8.20.104.0
8.20.104.1
...
8.20.104.255

?

OLDIFS="$IFS" ; IFS=".-" # Alter the default splitting in the shell
while IFS=":" read CORP IPS
do
        # Splits on . and -, so we get $1=8, $2=20, $3=104, $4=0, $5=8, ...
        set -- $IPS
        # Print a sequence of numbers, with $1.$2.$3. prepended
        seq -f "$1.$2.$3.%1.0f" $4 $8

        # I'm not sure this is a portable 'seq' option.
        # If it doesn't work,
        # N=$4 ; while [ "$N" -le "$8" ] ; do echo "$1.$2.$3.$((N++))" ; done
done < inputfile
IFS="$OLDIFS"  # Restore splitting to default

And if you wanted to loop over multiple files:

OLDIFS="$IFS" ; IFS=".-" # Alter the default splitting in the shell

cat inputfolder/* | while IFS=":" read CORP IPS # Read a line splitting on ":" alone
do
        # Splits on . and -, so we get $1=8, $2=20, $3=104, $4=0, $5=8, ...
        set -- $IPS
        # Print a sequence of numbers, with $1.$2.$3. prepended
        seq -f "$1.$2.$3.%1.0f" $4 $8

        # I'm not sure this is a portable 'seq' option.
        # If it doesn't work,
        # N=$4 ; while [ "$N" -le "$8" ] ; do echo "$1.$2.$3.$((N++))" ; done
done
IFS="$OLDIFS" # Restore splitting to default

I think I had a script to convert ranges to CIDR somewhere, though. Probably easy enough to catch the 0-255 case.

Aia · October 23, 2014, 2:04pm

@Azrael

The key to improve your ksh script is understanding this part

Otherwise, use only awk. It can read every file without any need for ls, cut, read, echo or any built-in or external shell call.

Azrael · October 23, 2014, 5:42pm

Had to leave for a long doctor's appointment. That worked perfectly Corona688! Learning IFS is something I've put off for much too long.

I'm going to use the code for individual lists you provided. The code works for calling all files in that directory, but I plan to link this to some lower-level code that initialize them all in separate threads at the same time.

I'll let you know how this turns out. Thank you very much!!