When it comes to programing and UNIX, I know just enough to be really really dangerous.
I have written a python script to parse through a file that contains ~1 million lines. Depending on whether a certain string is matched, the line is copied into a particular file. For the sake of brevity, the lines are something like this:
I tried the python code out on a small file, and everything seems to work. However, since the actual file is a massive, I want to double check it with grep to make sure that the total number of ABC-1's in file x is the same number of ABC-1's in file y.
On the command line, I wrote a simple script that will check this for me.
grep "ABC-1" fileName.fna | wc -l
This seems to work just fine.
Problem: The contents of the original file are copied into 10 other files. I want check each file AND since there are ~50 unique strings (i.e. ABC-1), I would like to check for each string. Writing the simple script ~500 times is tedious.
I wrote a bash script but when I execute the file from the command line (
), I get an error saying wc is an illegal option and that it cannot be found.
#!/bin/bash
echo "ABC-1 in fileName.fna"
grep "ABC-1" fileName.fna | wc -l
echo "CCC-33 in fileName3.fna"
grep "CCC-33" fileName3.fna | wc -l
Ideally, I think I should make a vector/list/array of file names and a vector/list/array of searchable strings and use a loop that will print out the string, the filename, and the number of times the string occurs in the file...but I don't know how to do that.
So if anyone knows how to re-arrange my 1-liner script - thank you. If anyone can help me with writing a loop script - thank you. Either option would be awesome.