How to search in ms word files and pdf's

Hi,

Please let me know if there is a unix command or any shell script which can do a search for a text in ms-word files, pdf'd and in excel files. Please let me know if there is a solution for this challenge.

Thanks,
Kesava.

strings is one utility that will search for a specific string in a file.

I'm not aware of a tool that is capable of searching PDF's in UNIX but in Linux there is a tool named pdftotext which converts Portable Document Format (PDF) files to plain text then you can search for texts using any UNIX command e.g grep.
You can search for the counterpart for MS doc files.

Hi.

Often such tasks are multi-step. In this case a first step might be to convert MS Word, PDF files to text, then use a member of the grep family to do the actual search.

Here are a few programs I noted after running:

man -k word

and

man -k pdf

your system may have different utilities.

antiword (1)         - show the text and images of MS Word documents
catdoc (1)           - reads MS-Word file and puts its content as plain text ...
wordview (1)         - displays text contained in MS-Word file in X window
wvWare (1)           - convert msword documents
pdftotext (1)        - Portable Document Format (PDF) to text converter (vers...
pdftk - useful tool for manipulating PDF documents
pstotext - Extract text from PostScript and PDF files

A Google search for linux convert pdf to text produces 47 million hits.

Best wishes ... cheers, drl

2 Likes

You could possibly convert it to text and try grep