Pdf to text

Is there a way using the pdf to text utility to convert all the pdf in a given directory?

So instead of one at a time:

pdftotext pdftotext hp-manual.pdf hp-manual.txt

a directory of 50 pdf files would be converted:

 pdftotext /home/dnascopev/Desktop/PDF.pdf /home/dnascopev/Desktop/PDF.txt

Thank you.

Take a look at the portable bit map utilities, like pdftopbm and gocr, the tool that converts to text. You could convert to pbm, jpg, etc...and then use gocr to get text.

I am not sure if gocr works on pdf files, but if not you can use pdftopdm.

The pdftotext works great for converting pdf files to text, but only seems to do one at a time. Can the command be modified for a directory? Thanks.

If you have the source for pdftotext , you can change it to do anything you want. If you don't have source, or if you want a simple solution, write a shell script that calls pdftotext for each PDF file in your current directory:

for file in *.pdf
do      pdftotext "$file" "${file%.pdf}".txt
done
 for file in *.pdf
do      pdftotext "$file" "${file%.pdf}".txt
done 

So, if the directory is /home/dnascopev/Desktop/PDF are you saying that can put in the shell scripr or each pdf name ans where? Thank you :).

You would use cd to change directory.

Also I'd use [pP][dD][fF] in case any of them were wonky case.

cd /home/dnascopev/Desktop/PDF
for file in *.[pP][dD][fF]
do
...
done

Sorry. By posting in the Shell Programming and Scripting forum, I assumed that you knew how to write and run a shell script.

Making more wild assumptions:

  1. you are using a UNIX or Linux system,
  2. you have more than one directory that contains files you want to process,
  3. you have a bin directory in your home directory, and
  4. $HOME/bin is in your command search path:

then create a file named pdftotextdir in $HOME/bin containing:

#!/bin/ksh
if [ $# -eq 1 ]
then    cd "$1"
else    printf 'Usage: %s directory\n' "${0##*/}" >&2
        exit 1
fi
for file in *.[Pp][Dd][Ff]
do      pdftotext "$file" "${file%.[Pp][Dd][Ff]}".txt
done

(If you don't have a Korn shell, you can change /bin/ksh to /bin/bash or the pathname of any shell that understands POSIX required shell variable expansions.)

Then issue the command:

chmod +x $HOME/bin/pdftotextdir

Then you can run your new utility to use pdftotext on every PDF file in whatever directory you want to process by issuing the command:

pdftotextdir directory

which for you latest request would be:

pdftotextdir /home/dnascopev/Desktop/PDF