How to improve grep performance...

pooga17 · February 13, 2008, 5:06am

Hi All,

I am using grep command to find string "abc" in one file .

content of file is
***********
abc = xyz
def= lmn
************

i have given the below mentioned command to redirect the output to tmp file

grep abc file | sort -u | awk '{print #3}' > out_file

Then i am searching content of out_file in muliple files... by using below mentioned command..

grep -f out_file l*view_data_file

but the same is very slow..is there any way i can improve grep performance
Thanks in advance

otheus · February 13, 2008, 6:19am

I think you mean $3, not #3.

What's with the |* ? Is that a typo?

Do you need to know which file contains the string? If not, it would be faster to merge all the files together, and then do the grep.

cat *data_files.dat  | grep -f out_file

Otherwise, you can do a parallelized search, assuming you can take advantage of a multi-CPU system:

for f in *data_files.dat ; do
   grep -f out_file $f  >>grep-out.$$  & 
done
wait
cat grep-out.$$

Of course, if there are thousands of dat files, this might bring the system "to its knees". In that case, you can have each grep do 5 at a time.

ls -1 *data_files.dat | 
while read f1; do
  read f2
  read f3
  read f4
  read f5
  grep -f out_file $f1 $f2 $f3 $f4 $f5 >>grep-out.$$ &
done

If any files contain spaces or strange characters, you'll need to enclose each variable in double-quotes.

Tytalus · February 13, 2008, 6:34am

depending on the number and size of files you are searching, and assuming you are using fixed-character strings, you may see better performance with fgrep rather than grep.