Grep MS Word document

Hi,

I have to read a MS word document to find some strings(expressions) .The reading should be done by paragraph.I have to show the entire paragraph If I find any string/expression in that.

Please help me out.

Thanks
Regards
Kris

Is the file in unicode?

Yes its in unicode

Try this on your file:

iconv -f UTF-16 -t UTF-8 myfile > temporary_file
grep 'pattern to match' temporary_file

If this works the way you want try this command to get the whole paragraph

sed -e '/./{H;$!d;}' -e 'x;/pattern to match/!d;'

This assumes a blank line exists between paragraphs.

Thank you for your time but It didn't work,I am getting invalid codeset error when I issue the follwoing command.

iconv -f UTF-16 -t UTF-8 filename > tempfilename

Error
iconv: Invalid codeset: UTF-8: The system cannot find the file specified.
iconv: Invalid codeset: UTF-16: The system cannot find the file specified.

when I issue iconv -l I am getting the follwing code set

Character sets: ISO8859-1:1987 8859 ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8
859-5 ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 CP037 EBCDIC CP273 CP277 CP278 CP2
80 CP284 CP285 CP297 CP437 CP500 CP850 CP852 CP857 CP860 CP863 CP865 CP866 CP870
CP871 CP905 ISO646 646 C

Thanks in advance
Regards
kris

You will have to do some reading:
A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX

This explains about using unicode in unix.