Problem using grep in bash script

errcricket · October 17, 2011, 10:59am

When it comes to programing and UNIX, I know just enough to be really really dangerous.

I have written a python script to parse through a file that contains ~1 million lines. Depending on whether a certain string is matched, the line is copied into a particular file. For the sake of brevity, the lines are something like this:

I tried the python code out on a small file, and everything seems to work. However, since the actual file is a massive, I want to double check it with grep to make sure that the total number of ABC-1's in file x is the same number of ABC-1's in file y.

On the command line, I wrote a simple script that will check this for me.

grep "ABC-1" fileName.fna | wc -l

This seems to work just fine.

Problem: The contents of the original file are copied into 10 other files. I want check each file AND since there are ~50 unique strings (i.e. ABC-1), I would like to check for each string. Writing the simple script ~500 times is tedious.

I wrote a bash script but when I execute the file from the command line (

), I get an error saying wc is an illegal option and that it cannot be found.

#!/bin/bash
echo "ABC-1 in fileName.fna"
grep "ABC-1" fileName.fna | wc -l

echo "CCC-33 in fileName3.fna"
grep "CCC-33" fileName3.fna | wc -l

Ideally, I think I should make a vector/list/array of file names and a vector/list/array of searchable strings and use a loop that will print out the string, the filename, and the number of times the string occurs in the file...but I don't know how to do that.

So if anyone knows how to re-arrange my 1-liner script - thank you. If anyone can help me with writing a loop script - thank you. Either option would be awesome.

otheus · October 17, 2011, 11:19am

Hrm. I dont understand why the bash script thinks wc is an option of grep, but really, this is the wrong approach:

If the file is sorted, you can do:

uniq -c

If it's not sorted, you can sort it just with sort. If you run out of memory, you can split the file into n files, sort each individually and then merge them with sort (see the man page for the merge option). Then the resulting file will be sorted and you just use the uniq command, above.

But lets say you just want to count the number of lines matching a particular string. Try:

grep -Fc  string filename

The -F ensures your string wont be interpreted as a regular expression pattern.

errcricket · October 17, 2011, 11:52am

Thank you otheus. Using your script from the command line works too. As for running the file from the command line...it does not work, but I suspect it has something to do with the execution path. Regardless, I think I am on the correct $Path to solving this problem.

otheus · October 17, 2011, 12:33pm

oh right. you need to do :

./scriptname.sh

alternatively, you can do:

PATH=$PATH:.

errcricket · October 17, 2011, 5:24pm

Thanks again Otheus.