Bash : Checking Large file for specific lines

Morning ..

I have a file with approximately 1000 lines. I want to check that the file contains, for example, 100 lines.
Something like whats given below is ugly. And even if I create a function I have to call it 100 times.
I may need to look through multiple files at times.

Is there a "cleaner" way to do this ?

x=$(grep "now is the time" 1000linefile.txt)
 if [ -z $x ]
  then
   echo "now is the time not found"
 fi 

y=$(grep "for all good men" 1000linefile.txt)
 if [ -z $y ]
  then
   echo "for all good men not found"
 fi 

Thanks !!! AND HAPPY FRIEDAY :slight_smile: :b:

---------- Post updated at 11:15 AM ---------- Previous update was at 10:48 AM ----------

Well I guess I answered my own question .. once I though about it.
If *.txt contains the 1000 line file(s) and linestocheckfile contains the 100 lines that I want to be sure is in the large file.

for x in `ls *.txt`
do
 while read line
  do
   z=$(grep "$line" $x)
    if [ -z "$z" ]
     then
      echo "$line is missing from $x"
    fi
 done < linestocheckfile
done


a bit simplified:

if [ $(grep -c mySearchPattern myFile) == 0 ]; then
   echo 'not found'
else
   echo 'found'
fi
1 Like

I would go further:-

if $(grep -q "now is the time" 1000linefile.txt)
then
   echo "It's a HIT"
else
   echo "Missed"
fi

The extra benefit is that the -q flag will cause the grep to exit immediately that it has matched, so if you have a properly large file (more than just a few disk blocks), it won't have to read it all if it finds a match, saving you IO time.

Potentially you could extend the search using an expression, but that's up to you knowing what you are looking for. If you want to know the state of each you would have to run twice right through the file. If you want to know if either is listed, then:-

if $(egrep -q "now is the time|for all good men" 1000linefile.txt)
then
   echo "It's a HIT"
else
   echo "Missed"
fi

I hope that this helps,
Robin

2 Likes

I'd propose to drop the "command substitution" $( ... ) as it's not needed in above construct.

If you have all target lines in a file, and you are sure they occur not more than once in the large file, you could try

if [ $(grep -cwf targetlines 1000linesfile) = $(wc -l < targetlines) ]
  then    echo "Found all"
   else    echo "Some missing"
   fi

To allow for duplicates of target lines, try $(grep -owf targetlines 1000linesfile | sort -u | wc -l) in the if construct above.

1 Like

Thanks guys !!