Bash Scipting (New); Run multiple greps > multiple files

LDHB2012 · December 21, 2012, 4:12pm

Hi everyone,

I'm new to the forums, as you can probably tell... I'm also pretty new to scripting and writing any type of code.

I needed to know exactly how I can grep for multiple strings, in files located in one directory, but I need each string to output to a separate file.

So I'd imagine I'd make a separate text file with each string that I want to grep. ex:

food1
food2
food3

.

Then i would use a loop function to grep all files in directory /Desktop/Blah/*, and output to a file named "the string i grepped for".

I would greatly appreciate guidance on this, as loops confuse the hell outa me and i'm not sure what "x" means in the following:

for x in 'cat /Desktop/Blah/*'; grep 'point to strings file' ..........

Where would $x go? Any guidance is helpful. Thanks everyone....

-David

Yoda · December 21, 2012, 4:24pm

Just to give you an idea, here is a sample script:-

#!/bin/bash

for file in /Desktop/Blah/*              # For each file in directory: /Desktop/Blah/
do
        grep food1 $file >> food1.txt    # Grep for food1, redirect & append o/p to file: food1.txt
        grep food2 $file >> food2.txt    # Grep for food2, redirect & append o/p to file: food2.txt
        grep food3 $file >> food3.txt    # Grep for food3, redirect & append o/p to file: food3.txt
done

Azrael · December 22, 2012, 12:28am

You could also use egrep if loops are confusing.

cat file | egrep '(food1|food2|food3)'

However, this may not be the format you want. Either way, knowing both methods will be good tools for later.

LDHB2012 · December 26, 2012, 8:50am

bipinajith, Azrael:

Thank you both for the replies. I understand very well the examples you have both posted. If we can make things a little more complex, but at the same time simple in regards to automating weekly tasks, that's the goal of mine...

Using the example code:

#!/bin/bash  for file in /Desktop/Blah/*              # For each file in directory: /Desktop/Blah/ do         grep food1 $file >> food1.txt    # Grep for food1, redirect & append o/p to file: food1.txt         grep food2 $file >> food2.txt    # Grep for food2, redirect & append o/p to file: food2.txt         grep food3 $file >> food3.txt    # Grep for food3, redirect & append o/p to file: food3.txt done

I'll have MULTIPLE (80+) strings to grep and will increase over time.

Adding/Editing the following code, how can I have files created, that are named after the string "grepped".

#!/bin/bash

for file in /Directory/*txt

do 

     grep -f mystringsfile.txt >> "the string.txt" #This would be an automatically assigned name. I'm guessing a variable would be needed here. How would it be written?

done

Yoda · December 26, 2012, 10:20am

Use awk instead:

for file in /Directory/*txt
do
  awk 'NR==FNR{ a[$0]=1;next } { n=0; for(i in a) { if($0~i) { print $0 >> i".txt" }}}' mystringsfile.txt $file
done

neutronscott · December 26, 2012, 10:33am

It'd take a lot of time to run 80 greps on every file. If you read in your word file into an array you can do it at once, depending on how many files and if you need to do subdirectories.

If not many words/files, you can try:

#!/bin/bash
wordfile=words.lst

# read in fixed strings from word file
while IFS= read -r line; do
        words+=('-e')
        words+=("$line")
done < "$wordfile"

echo grep -F "${words[@]}" -- dir1/*
mute@clt:~/temp/LDHB2012$ ./script
grep -F -e one -e string with spaces -e foo -e bar -- dir1/file1 dir1/file2 dir1/file3

remove the 'echo' and adjust as needed.

edit: oops. sorry. this doesn't meet the requirement of outputting to separate files. while I wasn't looking an awk solution was posted. That'd probably be best.

LDHB2012 · December 26, 2012, 1:30pm

Alright. I tried what you put here. I changed the .txt file to the actual file with the strings and changed the actual directory with the files to be searched.

Everything else I kept the same. It looks like this:

#!/bin/bash
for file in /ActualDirectory/*.txt

do

     awk 'NR==FNR{ a[$0]=1;next } {n=0; for(i in a) { if($0~i) { print $0 >> i".txt" }}}' stringsfile.txt $file

done

Here's my error:

awk: syntax error at source line 1
   context is 
               NR==FNR{ a[$0]=1;next } {n=0; for(i in a) { if($0~i) { print $0 >> >>> i".txt" <<<
awk: illegal statements at source line 1
   context is 
               NR==FNR{ a[$0]=1;next } {n=0; for(i in a) { if($0~i) { print $0 >> >>> i".txt" <<<
awk: syntax error at source line 1

-David

Yoda · December 26, 2012, 1:37pm

Use /usr/xpg4/bin/awk instead for Solaris or SunOS

/usr/xpg4/bin/awk 'NR==FNR{ a[$0]=1;next } {n=0; for(i in a) { if($0~i) { print $0 >> i".txt" }}}' stringsfile.txt $file

LDHB2012 · December 26, 2012, 2:27pm

Sorry to play message tag with you... I'm on a MAC, obviously using OSX. I read that creating awk files would be best and using BEGIN and END seem to be pertinent. However, awk is very confusing to me and I'm not sure where to go with this one.

"/usr/xpg4/bin/awk" is a bad interpretor.

Yoda · December 26, 2012, 3:57pm

I'm not sure about OSX.

I tested this code in following systems and it works fine

gawk in GNU/Linux
awk in HP-UX
/usr/xpg4/bin/awk in SunOS

Scrutinizer · December 26, 2012, 4:03pm

The trouble is with:

{ print $0 >> i".txt" }

Try:

{ f=i ".txt"; print>f }

LDHB2012 · December 27, 2012, 11:01am

That's awesome. I had some things on my end to fix. But finally got it working, thanks to y'all. I appreciate the help.

While it finally works, I'd still like to know what I'm typing, which most of it makes sense. But why wouldn't this work:

{ print $0 >> i".txt" }

But this did:

{ f=i ".txt"; print>f }

A quick explanation would suffice, if anything. I'm just making sure the coding sticks, I would hate to ask the same question twice.

Thanks again!

-David

Scrutinizer · December 27, 2012, 3:19pm

I think for some awks it is ambiguous what is the target of the redirection. To remove this ambiguity, you could either assign the composition of the filename to a variable first, like I suggested, or use brackets:

{ print > ( i ".txt" ) }

LDHB2012 · January 4, 2013, 3:36pm

Scrutinizer,

This code was given to me in this thread. I'm now going through some modifications to make this logic work with other files. Would you be able to take me through the code, piece by piece, so I can understand how to modify/create code for different things using awk.

awk 'NR==FNR{ a[$0]=1;next } { n=0; for(i in a) { if($0~i) { f=i ".txt"; print>f }}}' infile $file

I tried using code with a file that contains a list of things to search for. As I'm running it, it states the following for one file to be created:

awk: XYZ.txt makes too many open files
input record number 4845, file XXXXXX.files
source line number 1

xyz.txt is the file being created
XXXXX.files is the files being looked at

Source line 1 of the script looks like this

for file in /directory/*.files

This works with the original file I used for terms to search for, but when using a different one, with a longer list, I get those errors and almost empty created files.

Any help is appreciated. My real goal is to understand the code and be able to modify it myself if needed.

Thanks a bunch!

-David

Scrutinizer · January 4, 2013, 4:17pm

Hi, first, try these modifications to that code:

awk 'NR==FNR{A[$1]; next} { for(i in A) if($0~i){f=i ".txt"; print>>f; close(f) }}' infile "$file"

LDHB2012 · January 8, 2013, 1:13pm

That seems to work like a charm.

So if I wanted to create a separate file, that has the contents of EVERYTHING else (not included in the "infile"), how would that else statement look?

Scrutinizer · January 8, 2013, 1:31pm

Try:

awk 'NR==FNR{A[$1]; next} { for(i in A) if($0~i){f=i ".txt"; print>>f; close(f); next }}1'  infile "$file" > file.other

LDHB2012 · January 8, 2013, 4:57pm

With this one, that I already mentioned had worked. Is there a reason why it would take a pretty large amount of time to complete the script, while creating just one file for one search string is done in 2 seconds.

Is there a more effecient way of writing the code to make it faster? Is there another way to write this code, in order to find errors easier, where I can print "something" when there's an error?

---------- Post updated at 04:57 PM ---------- Previous update was at 01:49 PM ----------

Good Afternoon Scrutinizer,

I'm testing this code and there's something very funny about it. First off, the things that are being searched have a file created IMMEDIATELY, but not yet finished being searched until 30 minutes. To manually grep these strings out of the whole directory, it takes less than 2 seconds.

The results between manual search and script finishing are different. Also, the scripts returns different results when ran once, deleted and then ran again. None of the data changed, only the output files have discrepancies. I'm not sure why this would be...

-David

Scrutinizer · January 8, 2013, 5:33pm

Hi David, I had not really looked at your problem, just provided a fix for the ambiguous redirection and then suggested some improvements on that ..
OK, I see the awk is supposed to be part of a for loop is that correct? How many files are in that directory typically?

LDHB2012 · January 9, 2013, 7:13am

There most likely should be for loop. While I ask these questions and try to come up with solutions for easier processing on these logs, I'm taking small tutorials on the different commands (awk, sed, unique, cut, sort). Obviously these things will give me statistics that would take extremely long to manually perform, considering the amount of data that's present. So, in short, I believe it's a loop I'm looking for. =}

The directory contains roughly 50 files. The amount of files will continue to grow, 1 per day. So the idea is to incorporate, into the script, something that will search the NEW files and append to the current, existing file.

Then after that is smoothly running on a cron job, I will look into making a script that will give me stats on particular fields (I'm assuming, using the awk, cut, unique, and sed commands).

Scrutinizer, feel free to point me in a general direction to get started, instead of spelling everything out. It does seem that the examples you've listed in this thread are at an expert level of scripting. Where many tutorials I've been through seem to make separate lines for each (command?) and have a debug included incase something with the files change or it simply just doesn't work someday.

-David