Possible performance improvement (Bash and flat file)

prafulnama · May 7, 2010, 10:05am

Hello,

I am pretty new to shell scripts and I recently wrote one that seems to do what it should but I am exploring the possibility of improving its performance and would appreciate some help. Here is what it does - Its meant to monitor a bunch of systems (reads in IPs one at a time from a flat file). For each IP, it fetches a set of web pages, parses them to extract certain numbers, compares them against defined thresholds and alerts if the metric falls outside the threshold range. The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this? Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?????

Thanks,
-p

avronius · May 7, 2010, 10:23am

As the level of complexity increases, it begins to make more sense to utilize a database to manage the changing state of the environment. Maybe look into something simple to start with - like Berkely DB

jim_mcnamara · May 7, 2010, 10:35am

One major thing to look at: child process creation. Try to use shell builtins instead of
a lot of back tic ` ` (or $(... ) ) constructions.

You can also store your flat file in a variable, so you read it only once:

flatfile=$(< /path/to/my/flatfile)

Then you can step thru the records or create arrays of the data.

Corona688 · May 7, 2010, 2:01pm

It would help to see the actual code.

Most systems don't do that kind of locking unless you explicitly ask for it. But having two processes simultaneously read the same file handle wouldn't be a great idea, they might each get half a line or somesuch. If you're just reading flat files line by line, you could try a 'reader' script that reads everything for them and parcels them out individually. That'd have some extra overhead for the extra process and its pipes, but would let more than one reader operate at once.

I'll need to see your actual code to help you here, I think, at least some of it. What needs to be optimized depends not just on what you're doing, but how you're doing it. If you're new to shell scripting there's some trivial design mistakes that could be causing slowdowns... excessive use of pipes and/or backticks is particularly bad. If you've got pipe chains on almost every line, there's probably much room for improvement. In my early scripting days I wrote a linewrapper in BASH that fed everything through about 9 sub-processes, it ended up processing at 10 kilobytes per second!

prafulnama · May 11, 2010, 2:32pm

Thanks a lot everyone. I do seem to have a very large number of back tics. Would appreciate help in eliminating them and any other way of improving performance.

#!/bin/bash
#Retrive a list of proxies and compare specified metrics against their threshold values. Alert as required.

#Path to the list of proxies
proxylist="proxylist2"

#Path to the list of URLs, metrics and thresholds
metriclist="metriclist1"

#Path to the proxy history file
proxyhistory="proxyhistory1"

#Parse through the list of proxies and check the specified metrics
while true
do
while read line
do
    if [ "$line" ]
    then

        #Ping the machine to check status.
        ping -c 2 $line > /dev/null 2>&1
        status=`echo $?`

        if [ $status -eq 0 ]
        then
            #Retrieve device name using SNMP
            a=`snmpget $line system.sysName.0`
            set -- $a
            devicename=`echo $6`
        #echo "DEVICE - $devicename"

            #Read in a list of URLs, metrics and thresholds and apply them one at a time for each proxy
            while read line1
            do
                if [ "$line1" ]
                then
                    len=`echo $line1 | wc -w`
                    len1=$[len-1]
                    len1a=$[len-2]
                    set -- $line1
                    alertlevel=$1
                    url=$2
                    url1=$2
                    threshold=`echo $line1 | cut -d ' ' -f $len`
                    
                    
                    if [ "$threshold" != "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1`
                    elif [ "$threshold" == "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1a`
                        rate=`echo $line1 | cut -d ' ' -f $len1`
                    #echo "Rate - $rate"
                    fi

                    #Completing the URL
                    url="https://$line:8082$url"
                #echo "URL - $url"
                
                    #Retrieve the metric value(s) from the URL 
                    value=`retrievemetric1.sh "$url" "$metric"`
                #echo "VALUE = $value"

                    #If the threshold is explicitly defined
                    if [ "$threshold" != "RATE" ]
                    then

                        #Compare the metric value(s) against corresponding thresholds and alert if required
                        len2=`echo $value | wc -w`

                        for (( i = 1; i <= $len2 ; i++))
                        do
                            value1=`echo $value | cut -d ' ' -f $i`
                            check=`thresholdexceed1.sh "$value1" "$threshold"`
                            if [ "$check" == "true" ]
                            then
                                echo "$alertlevel - $devicename -> $metric - $value1. Threshold: $threshold"
                                echo "-------------------------------------------------------------------------------------"
                            fi
                        done

                    #If the threshold is rate based
                    elif [ "$threshold" == "RATE" ]
                    then
                        #Number of values from the URL
                        len2=`echo $value | wc -w`    
            
                        #Flag to check if all values are reflected in history file
                        stringMissing="false"

                        #Check if all values present in history file
                        for (( i = 1 ; i <= $len2 ; i++ ))
                        do
                            stringa="$devicename-$url1-$metric-$i-a"
                            stringb="$devicename-$url1-$metric-$i-b"
                            stringc="$devicename-$url1-$metric-$i-c"
                            stringd="$devicename-$url1-$metric-$i-d"
                            stringe="$devicename-$url1-$metric-$i-e"

                            if ! grep "$stringa" "$proxyhistory" > /dev/null
                            then
                                stringMissing="true"
                            fi
                            if ! grep "$stringb" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi
                            if ! grep "$stringc" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringd" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringe" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                        done
                    
                        #If a value is missing, delete all the ones that are present for that metric
                        if [ "$stringMissing" == "true" ]
                        then
                            grep -v "$devicename-$url1-$metric-" "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"

                            #Create all the required strings and initialize them
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a $val"
                                stringb="$devicename-$url1-$metric-$i-b 0"
                                stringc="$devicename-$url1-$metric-$i-c 0"
                                stringd="$devicename-$url1-$metric-$i-d 0"
                                stringe="$devicename-$url1-$metric-$i-e 0"
            
                                echo "$stringa" >> "$proxyhistory"
                                echo "$stringb" >> "$proxyhistory"
                                echo "$stringc" >> "$proxyhistory"
                                echo "$stringd" >> "$proxyhistory"
                                echo "$stringe" >> "$proxyhistory"

                            done    
                    
                        #If all the required strings are present
                        elif [ "$stringMissing" == "false" ]
                        then
                    
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a"
                                stringb="$devicename-$url1-$metric-$i-b"
                                stringc="$devicename-$url1-$metric-$i-c"
                                stringd="$devicename-$url1-$metric-$i-d"
                                stringe="$devicename-$url1-$metric-$i-e"


                                vala=`grep "$stringa[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valb=`grep "$stringb[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valc=`grep "$stringc[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vald=`grep "$stringd[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vale=`grep "$stringe[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                    
                            #echo "VALA - $vala"
                            #echo "VALB - $valb"
                            #echo "VALC - $valc"
                            #echo "VALD - $vald"
                            #echo "VALE - $vale"

                                if [ $vala -eq 0 ]
                                then
                                    echo "$stringa $val" >> "$proxyhistory"
                                    
                                elif [ $vala -ne 0 ] && [ $valb -eq 0 ]
                                then
                                    grep -v "$stringb" "$proxyhistory" > "temp"
                                    mv "/temp" "$proxyhistory"
                                    echo "$stringb $val" >> "$proxyhistory"
                            
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -eq 0 ]
                                then
                                    grep -v "$stringc" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringc $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -eq 0 ]
                                then
                                    grep -v "$stringd" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringd $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -eq 0 ]
                                then
                                    grep -v "$stringe" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringe $val" >> "$proxyhistory"

                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -ne 0 ]
                                then                    
                                    #threshold1=$[rate*-1]
                                    threshold1=$(printf "%s\n" "scale = 2; $rate*-1" | bc)
                                    threshold2=$rate

                                    diff1=$(printf "%s\n" "scale = 4; (($valb-$vala)/$vala)*100" | bc)
                                    diff2=$(printf "%s\n" "scale = 4; (($valc-$valb)/$valb)*100" | bc)
                                    diff3=$(printf "%s\n" "scale = 4; (($vald-$valc)/$valc)*100" | bc)
                                    diff4=$(printf "%s\n" "scale = 4; (($vale-$vald)/$vald)*100" | bc)
                                    diff5=$(printf "%s\n" "scale = 4; (($val-$vale)/$vale)*100" | bc)

                                #echo "DIFF1 - $diff1"
                                #echo "DIFF2 - $diff2"
                                #echo "DIFF3 - $diff3"
                                #echo "DIFF4 - $diff4"
                                #echo "DIFF5 - $diff5"

                                    overThreshold1=`echo "$diff1 > $threshold2" | bc`
                                    underThreshold1=`echo "$diff1 < $threshold1" | bc`
                                    overThreshold2=`echo "$diff2 > $threshold2" | bc`
                                    underThreshold2=`echo "$diff2 < $threshold1" | bc`
                                    overThreshold3=`echo "$diff3 > $threshold2" | bc`
                                    underThreshold3=`echo "$diff3 < $threshold1" | bc`
                                    overThreshold4=`echo "$diff4 > $threshold2" | bc`
                                    underThreshold4=`echo "$diff4 < $threshold1" | bc`
                                    overThreshold5=`echo "$diff5 > $threshold2" | bc`
                                    underThreshold5=`echo "$diff5 < $threshold1" | bc`

                                #echo "TH1 - $overThreshold1, $underThreshold1"
                                #echo "TH2 - $overThreshold2, $underThreshold2"
                                #echo "TH3 - $overThreshold3, $underThreshold3"
                                #echo "TH4 - $overThreshold4, $underThreshold4"
                                #echo "TH5 - $overThreshold5, $underThreshold5"

                                    thresh1="false"
                                    thresh2="false"
                                    thresh3="false"
                                    thresh4="false"
                                    thresh5="false"

                                    if [ $overThreshold1 -ne 0 ] || [ $underThreshold1 -ne 0 ]
                                    then
                                        thresh1="true"
                                    fi
                                    if [ $overThreshold2 -ne 0 ] || [ $underThreshold2 -ne 0 ]
                                    then
                                        thresh2="true"
                                    fi
                                    if [ $overThreshold3 -ne 0 ] || [ $underThreshold3 -ne 0 ]
                                    then
                                        thresh3="true"
                                    fi
                                    if [ $overThreshold4 -ne 0 ] || [ $underThreshold4 -ne 0 ]
                                    then
                                        thresh4="true"
                                    fi
                                    if [ $overThreshold5 -ne 0 ] || [ $underThreshold5 -ne 0 ]
                                    then
                                        thresh5="true"
                                    fi

                                    if [ "$thresh1" == "true" ] && [ "$thresh2" == "true" ] && [ "$thresh3" == "true" ] && [ "$thresh4" == "true" ] && [ "$thresh5" == "true" ]
                                    then
                                        echo "$alertlevel - $devicename -> $metric - $diff1%, $diff2%, $diff3%, $diff4%, $diff5%. Threshold: $threshold1% to $threshold2%"
                                        echo "-------------------------------------------------------------------------------------"
                                    fi
                                                            
                                    grep -v "$devicename-$url1-$metric-$i-" "$proxyhistory" > "/temp"
                                    mv "temp" "$proxyhistory"

                                    stringa="$devicename-$url1-$metric-$i-a $valb"
                                    stringb="$devicename-$url1-$metric-$i-b $valc"
                                    stringc="$devicename-$url1-$metric-$i-c $vald"
                                    stringd="$devicename-$url1-$metric-$i-d $vale"
                                    stringe="$devicename-$url1-$metric-$i-e $val"

                                    echo "$stringa" >> "$proxyhistory"
                                    echo "$stringb" >> "$proxyhistory"
                                    echo "$stringc" >> "$proxyhistory"
                                    echo "$stringd" >> "$proxyhistory"
                                    echo "$stringe" >> "$proxyhistory"

                                fi
                            done
                            sort "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"
                        fi

                    fi    
                fi
            done < "$metriclist"
        else
            echo "$line did not respond to PING"
        fi
        echo "***********************************************************************************"    
    fi
done < "$proxylist"
done

Corona688 · May 11, 2010, 3:02pm

Wow, yeah... whenever you have

something="`echo $variable`

just do

something="$variable"

Also,

ping -c 2 $line > /dev/null 2>&1
status=`echo $?`
if [ $status -eq 0 ]
then
...

can just be

if ping -c 2 $line > /dev/null 2>&1
then
...

Also, I'm not entirely sure what this line is doing:

if [ "$line" ]

...but if you're guarding against blank lines:

if [ ! -z "${line}" ]
...

Or better yet, do this. It will skip blank lines without another layer of nested if at all:

[ -z "${line}" ] && continue

Constructs like these are extremely slow since they can run cut uncountable numbers of times.

value1=`echo $value | cut -d ' ' -f $i`

Instead, since you're using a shell that supports arrays, just split it into an array once then use the array. This should split fine on spaces:

ARRAY=( $value )

...

value1="${ARRAY[$i]}"

You can also split on other characters by changing the IFS variable but be aware that this affects read too.

You're running grep many, many times per loop. This is slow. Instead of

if ! grep file string1 ; then str=no ; fi
if ! grep file string2 ; then str=no ; fi
...

try

HAS1=0
HAS2=0
HAS3=0
HAS4=0
while read TESTLINE
do
        [[ "${TESTLINE}" =~ $string1 ]] && HAS1=1
        [[ "${TESTLINE}" =~ $string2 ]] && HAS2=1
        [[ "${TESTLINE}" =~ $string3 ]] && HAS3=1
        [[ "${TESTLINE}" =~ $string4 ]] && HAS4=1
done < file
OKAY=0
[ "$HAS1" -gt 0 ] && [ "$HAS2" -gt 0 ] && [ "${HAS3}" -gt 0 ] && [ "${HAS4}" -gt 0 ] && OKAY=1

This reads the file only once and doesn't execute four extra processes. Note that the ~= regular expression operator only works in bash.

Whenever you have VAR=`something | grep something | grep something | grep something` that's an enormous performance waster, and likely possible with shell built-ins, though exactly how depends on what bits you want to get.

...and so forth and so forth. Your script is enormous. You might want to break it into functions so you can tell what's happening where. Functions are easy:

function myfunc
{
  echo $1 $2
}

myfunc a b

They act like processes in that they return numbers, not strings, and output to stdin/stdout/stderr. But they can set global variables (as long as they're not behind a pipe).

The advanced bash scripting guide is a nice reference.

dunkar70 · May 11, 2010, 3:17pm

You can also keep the historical data manageable by tailing the file. Log all values into a single file, such as history.log
At the beginning of the log file processing, execute:

tail history.log > fileToProcess.log

This will give you a smaller file from which to get your historical data. The size of your history.log file will not matter, your processing file will always contain the last 10 entries.

Corona688 · May 11, 2010, 3:42pm

Furthermore:

echo "$stringa" >> "$proxyhistory"
echo "$stringb" >> "$proxyhistory"
echo "$stringc" >> "$proxyhistory"
echo "$stringd" >> "$proxyhistory"
echo "$stringe" >> "$proxyhistory"

can be done with

cat >> "$proxyhistory" <<EOF
$stringa
$stringb
$stringc
$stringd
$stringe
EOF

instead of running echo 5 separate times. Note that the last EOF must be at the START of the line and not indented.

prafulnama · May 11, 2010, 4:58pm

Thanks!

@Corona: I could use the 'split array' for retrieving a specific word from a string but for retrieving a substring, is there an alternative to

something=`echo $line | cut -d ' ' -f $start-$end`

Corona688 · May 11, 2010, 6:05pm

prafulnama:

Thanks!

@Corona: I could use the 'split array' for retrieving a specific word from a string but for retrieving a substring, is there an alternative to
something=`echo $line | cut -d ' ' -f $start-$end`

A substring is

${VARNAME:START:LENGTH}

Expressions are allowed in there, so

${VARNAME:$START:((END-START))}