Hello,
I was wondering if anyone knows a faster way to search and compare strings and dates from 2 files?
I'm currently using "for loop" but seems sluggish as i have to cycle through 10 directories with 10 files each containing thousands of lines.
Given:
-10 directories
-10 files (tab delimited)
-1 lookup file
-1 report file
Heres what i'm trying to achieve:
1. loop through the 10 dirs, with 10 files
2. for every line read from file
a. get column 1 string and grep it from lookup file, write output to result file
b. get column 3 date and store as vardate
c. for every line of result file from 2.a
c.I get column 2 date as varstart
c.II get column 3 date as varend
c.III get column 7 as lastcol
c.IV check if vardate from 2.b is between varstart and varend
c.V if vardate is between varstart and varend, write line from 2. + vartstart + varend + lastcol to report file
Here is my straightforward solution so far, though its working fine, its not that elegant and fast:
for loop through the 10 dirs
do
for loop through 10 files
do
while read LINE of file
do
grep column 1 from lookup file > result file
vardate=column 3
while read ROW of result file
do
varstart=column 2
varend=column 3
lastcol=column 7
if [[ varstart -le vardate && varend -ge vardate ]]
then
printf "LINE\tvarstart\tvarend\tlastcol\n" >> report file
else
:
fi
done
done
done
done
I was trying to replace the if statement with the following (as i have learned through research, awk is much faster in line by line processing):
awk '{varstart=$2; varend=$3; lastcol=$7; getline;} varstart <= vardate && varend >= vardate {print l,varstart,varend,lastcol}' l=${LINE} vardate=column3 resultfile >> reportfile
But cant seem to get it to work properly. Plus, i was thinking if i can use awk as well on the parent while loop to be more efficient, but i have no idea anymore if its possible to use another awk, within an awk statement.
My goal is just to make this script work faster. Any suggestions or alternate approaches is well appreciated.
Thank you very much guys.