Search pdfs in command line

Hi,

I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.

What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.

find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"

I've been told that this could be achieved using pdfgrep, but I don't have root access on this machine and it appears that I'm missing some libraries when I tried to install it, so would prefer if I'm able to do this using pdftotext. Many thanks!

If you create a file named grepPDF containing:

#!/bin/ksh
for i in "$@"
do
        pdftotext "$i" - | grep 'search phrase' > grepPDF.$$ && \
                printf "\nFollowing lines are in %s:\n" "$i" && \
                cat grepPDF.$$
done
rm grepPDF.$$

and make it executable:

chmod +x grepPDF

and move it to a directory that is in your search path, then the find command:

find . -name '*.pdf' -exec grepPDF {} +

should do what you want.

1 Like

Give a try..

find . -name '*.pdf' -type f | while read FILE
do
 pdftotext $FILE | grep -q "search phrase"
 [ $? -eq 0 ] && echo $FILE
done
1 Like