Hi,
I have to write one script that has to search a list of numbers in certain zipped files.
For eg. one file file1.txt contains the numbers. File1.txt contains 5,00,000 numbers and I have to search each number in zipped files(The number of zipped files are around 1000 each file is 5 MB)
I have to search each number in zipped and if number is not there in any zipped file then I have to send the output to a file .
file1.txt
--------
7234834
2342346
65745654634
345423534
.
.
.
.
783458934
345345
Search all these numbers in zipped files.
abc.txt.gz.processed
xyz.txt.gz.processed
ere.txt.gz.processed
gfdf.txt.gz.processed
dfg.txt.gz.processed
dgg.txt.gz.processed
.
.
.
kjh.txt.gz.processed
outputfile.txt
number 35345, not found.
number 345345, not found.
number 87979 not found.
number 234234234, not found.
.
.
.
number 234234234, not found.
number 234234234, not found.
[/COLOR][/COLOR]Sample zipped file format:(I am providing 2 records of the zipped file)
KKKKK 1454545345 842011011920025500000001287009909427909 031378055730681 KKKKKK AAA MMMMMMM034535345345345345
.
.
.
.
KKKKK 1454545345 842011011920025500000001287009909427909 03156456456546 KKKKKK AAA MMMMMMM034535345345345345
Red item is the number to search.
I wrote 1 script ..but it is taking too much time. it is taking around 2 minutes to search 1 number. So to serach all numbers it will take 5,00,000 * 2 minutes..Not a feasible solution. Because I have to run this script daily.If I run the command in the background, then unix throws the error that it can't fork process too much processes.
The script that I wrote is:
#!/usr/bin/ksh
for num in `cat file1.txt`
do
find . -name "*processed" -print | xargs gunzip -c | grep -q $num || echo "$num not found" >> outputfile.txt &
done
Please help me to fine tune this script so that I can get the output in less time.........
Thanks