Speed Up Grep

nanthagopal · February 1, 2013, 11:12pm

Hi,

 I have to grep string from 20 - 30 files each carries 200 - 300 MB size and append to the     
file. How to speed the      
grepping time.

cat catalina.out_2012_01_01 | grep "xxxxx"  >> backup.txt

PLZ, Suggest me,

Regards,
Nanthagopal A

Yoda · February 1, 2013, 11:29pm

First of all do not use cat with grep

If you are searching for fixed strings try using fgrep instead which is much faster.

fgrep "xxxx" catalina.out_2012_01_01 > backup.txt

nanthagopal · February 1, 2013, 11:43pm

I have change

grep

to

fgrep

and still same performance

jim_mcnamara · February 1, 2013, 11:45pm

For multiple files (grep -F is the same as fgrep, you should use one of these)
This runs five searches at a time in parallel. So it is approximately 5 times faster than a single thread of grep, on modern multi-core cpus.

cnt=0;
for i in catalina.out* 
do
   cnt=$(( $cnt + 1 ))
   grep -F 'xxxxxx' $i >> backup.txt
   [ $(( $cnt % 5  ))  -eq 0 ]  && wait
done
wait

Don_Cragun · February 2, 2013, 3:59am

jim mcnamara:

For multiple files (grep -F is the same as fgrep, you should use one of these)
This runs five searches at a time in parallel. So it is approximately 5 times faster than a single thread of grep, on modern multi-core cpus.
cnt=0;
for i in catalina.out* 
do
   cnt=$(( $cnt + 1 ))
   grep -F 'xxxxxx' $i >> backup.txt
   [ $(( $cnt % 5  ))  -eq 0 ]  && wait
done
wait

Hi Jim,
Am I correct in assuming that you intended to have an & after backup.txt in this for loop?

Should the number of grep's to be run in parallel be tied to the number of CPU cores available to the process, or is grep I/O limited?
---------------------------
Added later: Are you sure that the output from grep is line-buffered? If not you could end up with output with parts of lines intermixed by running multiple greps writing to a single output file!

Chubler_XL · February 3, 2013, 8:22pm

You could try using the --mmap flag and process all files with 1 grep call (as there are only 20 - 30 files the command line length should be OK). This assumes more modern, GNU greps that support the mmap(2) system call to read files and that files don't shrink while grep is working:

grep -F --mmap -h "xxxxxx" catalina.out* >> backup.txt

Edit: use -F if your search string is fixed (ie not a regular expression).