KSH: Opening Files based on a file list

drumminfool91 · August 7, 2010, 11:00pm

I'd like to grep files for key words using korn shell, and compile the actual contents (not just file name) of those files that contain a combination of those grepped key words into one repository file for reference. However, I'm stuck at the combining part. Here's what I have thus far:

egrep key * | egrep word

This locates all files that contain both the words key and word. Now I want to combine the contents of the files that were found. Does anybody have a notion of how this is accomplished? I'm sure it involves cat, but am having a hard time thinking outside of the box on this.

Thanks in advance!

guruprasadpr · August 7, 2010, 11:20pm

Hi
Not sure if I understood you correctly.

egrep -h '(key.*word|word.*key)' * > result

where result is the file which will contain the compiled result.

Guru.

agama · August 7, 2010, 11:52pm

I read it just a bit differently.... OP wants to have the contents of any file that contains any combination of a set of keywords, not just the lines that contain the keywords. If that is correct, then something like this will work depending on whether all keywords must appear on the same line, or if the file is considered 'matched' if any of the keywords appears in the file.

egrep -l "key|word" * | xargs cat >result  

egrep -l "key.*word|word.*key" | xargs cat >result

drumminfool91 · August 8, 2010, 12:10am

Thanks so much! The egrep -l "system.*log|log.*system" * | xargs cat > result
is exactly what I'm looking for. I'm trying to understand xargs, though. What does that do for me here? I only ask b/c I've never used before.

agama · August 8, 2010, 12:24am

It is possible to have written the command like this:

cat $(egrep -l "pattern" *)

except that the result of the egrep might be more arguments than can be handled on a single command line. By piping the output to xargs, xargs will execute the command that you give it (cat in this case) with n number of arguments such that n does not exceed the limit from a command line perspective. It may invoke the command multiple times to accomplish its task.

With this said, I realise that my original command would be flawed as the '*' given to egrep might expand into too many arguments, causing an error, and thus the xargs in this case would do no good. This would probably be better:

ls | xargs egrep -l "system.*log|log.*system" | xargs cat > result

The ls command generates a list of filenames which is given to egrep via the first xargs. The result of the searches is then given to cat via the second xargs.

When writing a script it is usually best to assume the worst case and use xargs in this situation. If you're willing to live with the errors, and are pretty sure that there will not be more files in a directory than the max allowable args on a command line, then the first form (without xargs) is easiest and straightforward.

kurumi · August 8, 2010, 10:17pm

 cat $(grep -l "word1*word2|word2*word1"  *)