Dear Friends,
I am using Ubuntu 15.10, 34 bit system. I added a Nautilus-Actions script in shell script to convert PDF files to text. There are 2 types of PDF
-
Scanned PDF -- Not OCR type -- When I convert it to text it work , but as the part it must (text file) open in gedit . But I can see a blank file
eventhough it came in real file... -
For Normal PDF (searcable one) -- it works fine
I add my code for your reference ... please advise what I do to avoid this issue..
#!/bin/bash
cd $1
if [[ $2 = *.pdf ]]; then
#echo pdf > "anes.txt"
MYFONTS=$(pdffonts -l 5 "$3" | tail -n +3 | cut -d' ' -f1 | sort | uniq)
if [ "$MYFONTS" = '' ] || [ "$MYFONTS" = '[none]' ]; then
#Scanned PDF
convert -density 300 "$3" "${3%.*}.tiff"
tesseract "${3%.*}.tiff" "$3"
sleep 2
rm -f "${3%.*}.tiff"
gedit "${3/%.*}.txt"
else
pdftotext "$3"
gedit "${3/%.pdf/.txt}"
fi
elif [[ $2 = *.tif ]] || [[ $2 = *.tiff ]] || [[ $2 = *.jpg ]] || [[ $2 = *.jpeg ]] || [[ $2 = *.png ]] || [[ $2 = *.gif ]]; then
tesseract "$3" "${3%.*}"
gedit "${3/%.*}.txt"
else
# Not implemented case...
#echo Nothing to do > "anes.txt"
fi
Waiting for your fast response
Thanks
Anes