Hi,
I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.
What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.
find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"
I've been told that this could be achieved using pdfgrep, but I don't have root access on this machine and it appears that I'm missing some libraries when I tried to install it, so would prefer if I'm able to do this using pdftotext. Many thanks!
If you create a file named grepPDF
containing:
#!/bin/ksh
for i in "$@"
do
pdftotext "$i" - | grep 'search phrase' > grepPDF.$$ && \
printf "\nFollowing lines are in %s:\n" "$i" && \
cat grepPDF.$$
done
rm grepPDF.$$
and make it executable:
chmod +x grepPDF
and move it to a directory that is in your search path, then the find command:
find . -name '*.pdf' -exec grepPDF {} +
should do what you want.
1 Like
clx
September 24, 2012, 9:23am
3
Give a try..
find . -name '*.pdf' -type f | while read FILE
do
pdftotext $FILE | grep -q "search phrase"
[ $? -eq 0 ] && echo $FILE
done
1 Like