I am trying to find a solution to a request here at work. I have been asked to do a full text search of around 300,000 files for multiple content items.
The following words need to appear in the file.
(april and\or may) and pie and (red and\or white).
So a file with the words april pie and white would be valid, but a file with only the words april and white would not.
a file with all the words is ok also, I have tried egrep with regex but its not doing what I need. So I figured I needed to write something in shell or perl.
Thank you for the quick reply, when I try do to this, it gives me the following errors. The first egrep works, but the | xargs is where it fails. Any ideas on why i might be getting this. If it matters this is AIX 6.1. I have full read\write access to these files. Those are the only 2 files in my test directory i am using to see if I can make it work.
grep: 0652-033 Cannot open test
grep: 0652-033 Cannot open 1.txt
#!/bin/ksh
ok()
{
awk ' !arr["pie"] && /pie/ {arr["pie"]++}
!arr["March or April"] && ( /March/ || /April/ ) {arr["March or April"]++}
!arr["red or white"] && (/red/ || /white/) {arr["red or white"]++}
END { for(i in arr) {k++} exit( k==3 ? 0 : 1) }' "$1"
print $?
}
find /path/to/files -type f |
while read fname
do
[[ $(ok $fname) -eq 0 ]] && echo $fname
done
If the files are large and those keywords are scarce, then any solution has the potential take a very, very long time. Your "requirements" make the request seem more like homework, which I hope it is not. We have rules for homework.
The requirements are for lawyers, we need the full text search for some litigation. I changed the keywords because well no one needs to know what we are searching for. Thank you for your code, I appreciate the help, I am not good at scripting. If you would like conformation this is for a professional purpose I would be more than happy to provide that.