Extract Table from PDF

parshant_bvcoe · December 20, 2008, 2:03pm

Hi Guys!

I want to extract table from PDF in HTML. Can we do this using Shell script....??. Please provide me your suggestions. Any help will be highly appreciated. Thanks!

fpmurphy · December 20, 2008, 8:30pm

The short answer is - probably not. The long answer is that it depends on how your PDF document was constructed. If the table you want was embedded as a standard table structure, then yes it is possible but difficult.

drl · December 20, 2008, 9:10pm

Hi.

You may wish to see if your system has these (and others like them), or Google for them:

pdftotext (1)        - Portable Document Format (PDF) to text converter (version 3.00)

pdftk - useful tool for manipulating PDF documents

-- from a search of the Debian repository

I have used pdftk for a few things, so I know that it works for some tasks.

If that doesn't work, then if you can get the text, you may be able to use other tools to convert to HTML. You didn't give us much to go on, so you will need to decide how much the result is worth your investment of time.

Best wishes ... cheers, drl