wget -i genedx.txt
The code above will download multiple pdf files from a site, but how can i download and convert these to .txt?
I have attached the master list (genedx.txt - which contains the url and file names)
as well as the two PDF's that are downloaded. I am trying to have those two files download as text files. Thank you.
is that a seperate command or can it be used with the wget command? Thanks.
It is a separate command, which -- like any other separate command -- you can use with wget, either by piping the output or by feeding the resulting file into it once wget is done.
So would the command be:
wget -i genedx.txt | info_sheet_ube.pdf Info_Sheet_XomeDx.pdf
and where do I download access pdftotext? Thanks.
No, pipes do not work that way.
What you would actually do depends on the contents of genedx.txt, and what you want to do with it.
Here is the second google hit .
After installing PDFMiner, do batch conversion with a for loop. Nothing to do with pipe here.
$ for f in `ls *.pdf`; do pdf2txt.py $f > ${f%.pdf}.txt; done
cmccabe
September 9, 2014, 1:35pm
8
So just:
Directory containing the 4 pdf files
cd "C:\Users\cmccabe\Desktop\PDF"
followed by:
for f in `ls *.pdf`; do pdf2txt.py $f > ${f%.pdf}.txt; done
Thanks.