grep 1000s of files with 1000s of grep values

Hi,

I have around 200,000 files in a given directory.

I need to cat each of these files and grep them for thousands of identifier values (or strings) in a given text file.

The text file looks something like this:

1234
1243545
1234353
121324

etc with thousands of entries.

Can you please assist how I can do this, and in the most efficient manner possible because this script will no doubt take a long time to run.

Thanks in advance.

Mantis

try:

grep -f text_file many_files > new_file

This task will definitely complete before the next ice age sets in. (humor... sort of)

Consider adding some parallelism. This will only do well on a multi-cpu or box with a cpu that supports the equivalent of hyperhtreads. rdrtx1's solution is as good as it gets for a single cpu box. You may be able to run two processes in parallel. I do not know.

split your pattern file into several smaller files, because the more lines you have in the pattern file the more cpu is spent looking at each line in the search file.

Example with 1000 line file split into n x m line files: 4 X 250 or 8 x 125 might be better.

This benefits from disk controller caching and having grep run through fewer lines of patterns for each line of source. Let's say you think 8 parallel processes will do well.
Some systems do NOT do better with this, so set up a small test first.

#/bin/bash
cd /directory/with/zillions/of/files

> /path/to/result

ls | while read fname
do
 grep -f /path/to/file1  $fname >> /path/to/result  & 
 grep -f /path/to/file2  $fname >> /path/to/result  &
 grep -f /path/to/file3  $fname >> /path/to/result  &
 grep -f /path/to/file4  $fname >> /path/to/result  &
 grep -f /path/to/file5  $fname >> /path/to/result  &
 grep -f /path/to/file6  $fname >> /path/to/result  &
 grep -f /path/to/file7  $fname >> /path/to/result  &
 grep -f /path/to/file8  $fname >> /path/to/result  &
 wait
done

Thank you so much fellas. Been so busy own been able to try your suggestions today.

Regarding the paralellism that is a great idea, problem is I have to put a condition after the grep eg:

grep -f /path/to/file1  $fname >> /path/to/result  &
        if [ $? = 0 ];
        then
        cp -p $fname $PATH
        fi

But this wont work correctly because of the & which will always be correct. So how do I use parallelism in a script like this?

Thanks again.
Mantis