Improve the performance of a shell script

Hi Friends,

I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process.

Please help me to make it faster and improve the performace of the script. i would be very happy if this report gets generated within 5 mins - if possible
Below is my shell script

cp a_15.txt abc_15.txt && cp a_15.txt xyz_15.txt


total_word_count=0; match_word_count=0; alerts_matched=0;

while read outer_line
do
echo -e "$outer_line"
total_word_count=`echo $outer_line |tr '[ : .' ' ' |awk '{ print NF} '`
outer_line=`echo $outer_line |tr '[ : .' ' '`

##
while read inner_line
do

###
for word in $outer_line
do
echo $inner_line |grep -i -w "$word" 1>/dev/null
if [ $? -eq 0 ];then 
match_word_count=`echo $match_word_count + 1|bc`
else 
:
fi
done
###

match_pcnt=`echo "scale=2; $match_word_count/$total_word_count*100"|bc |awk -F"." '{print $1}'`


if [ $match_pcnt -ge 66 ];then
alerts_matched=`echo $alerts_matched + 1|bc`
inner_line=`echo $inner_line| tr '[ : .' '.'`
sed -i "s/$inner_line//g" ./abc_15.txt
sed -i '/^$/d' ./abc_15.txt

else
:
fi

match_word_count=0;
match_pcnt=0

done <./abc_15.txt
##
echo -e "\nALERTS_MATCHED: $alerts_matched\n\n"

alerts_matched=0;
cat ./abc_15.txt >./xyz_15.txt

done <./xyz_15.txt >APS.out

---------- Post updated at 10:00 AM ---------- Previous update was at 09:55 AM ----------

The alert messages (4000 lines) will be like below

ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es

The report i want looks like below

ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 300

ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 55

ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
ALERTS_MATCHED: 700

I can barely read the script because of the lack of contrast between foreground and background colors.

Some one please help me, its quire URGENT.

Okay - please don't bump your posts. Also, if this is an emergency, which doesn't seem likely, please post emergency requests in the emrgency forums
Thank you.

You create a child process everytime you use the ` ` construct. You are creating further child processes with each awk, bc, etc. invocation. Some of your lines of code create 3 or 4 child processes. Those commands are inside nested loops. You are creating thousands of children, each one incurs a lot of resource usage.

Does this works for you ?

 
awk '{a[$0]++} END { for(i in a) { print i "\n" "ALERTS_MATCHED:" a} }'  input_file

Ignoring what the script does at the moment, what constitutes a "match"?
Could you highlight those parts of some sample messages which you are trying to match.

Comment on efficiency and logic:
The script inner loop is executed 4,000 x 4,000 = 16,000,000 times and then two in-situ edit "sed -i" commands are executed on the INPUT file to the inner loop (./abc_15.txt) for every "match". Possibly an attempt to reduce processing by removing records from one of the copies of the input file.

I guess that this is some Linux variant with bash?

Although this does not match your exact output format, it can do the job

sort some-alert-file | uniq -c

And chihung's shell solution can be extended further to match the report output -

$
$
$ cat -n alert.log
     1  ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
     2  ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
     3  ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
     4  ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
     5  ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
     6  ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
     7  ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
     8  ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
     9  ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
$
$
$ sort alert.log | uniq -c | while read num text; do printf "$text\nALERTS MATCHED: $num\n"; done
ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
ALERTS MATCHED: 3
ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS MATCHED: 4
ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS MATCHED: 2
$
$

I gave up after reading the first 10 lines of the OP's script, but I guess this is what it's trying to do.

tyler_durden

I think the O/P is trying to match the messages BUT ignoring some of the message such as perhaps the timestamp. The original script tests for a percentage match in words in the message which is a bit too fuzzy for me.

We await the O/Ps definition of a "match".

Hi All,

apologies if my post is unclear....

Let me explain you.

My alert.log will contain 1000's of error messages. among them only very few will be exactly same. But 50-100 alerts may have the more or less same pattern (matches 75% of the words in a line). So i want to consider not only the exactly matching lines, but also the 75% words matching alerts as matched alerts.

Example:

The 2-3 of the below lines matches exactly, but other lines too matches but at least 75%. so i consider all the below lines as matched lines. Hope now i'm clear :)


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.
Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Shell Programming Post questions about KSH, BASH, PERL, PHP, SED, AWK and OTHER shell scripting languages here.

Shell Scripting Post questions about , SH, BASH, PERL, PHP, SED, OTHER shell scripts and shell scripting here.

Shell Programming and Scripting about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and  scripting languages here.

Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

---------- Post updated at 06:38 AM ---------- Previous update was at 06:33 AM ----------

as my error messages won't match exactly, i could not use sort -c command. as at least 75% of the words matching, i'm comparing them word by word (for loop). i agree as there are 1000's of lines in the alert.log, the while and for loops may execute millions of times, hence i have approached you experts for the better solution.

the inner while and outer while loops use the same file. for my convenience i have renamed them as 2 different files.

Lateral though. If we correct and enhance your "tr" command to remove all awkward characters one sample message comes out as:

ERROR 19 45 40 529 ERROR Thread 26 RunnerPool Std RamPricingQueryRunner run 493 Got exception parsing VAN query result for product APS Q

If we ignore the portion in red, does the rest of the message constitute a "match?
For that matter is the "493" just an error message number which would be good enough for a "match"?

The enhanced "tr" is:
tr '\-[]():.,' '       '|tr -s '[:space:]'
Note that there are no space characters in the first string.