Hi,
I have to read a MS word document to find some strings(expressions) .The reading should be done by paragraph.I have to show the entire paragraph If I find any string/expression in that.
Please help me out.
Thanks
Regards
Kris
Hi,
I have to read a MS word document to find some strings(expressions) .The reading should be done by paragraph.I have to show the entire paragraph If I find any string/expression in that.
Please help me out.
Thanks
Regards
Kris
Is the file in unicode?
Yes its in unicode
Try this on your file:
iconv -f UTF-16 -t UTF-8 myfile > temporary_file
grep 'pattern to match' temporary_file
If this works the way you want try this command to get the whole paragraph
sed -e '/./{H;$!d;}' -e 'x;/pattern to match/!d;'
This assumes a blank line exists between paragraphs.
Thank you for your time but It didn't work,I am getting invalid codeset error when I issue the follwoing command.
iconv -f UTF-16 -t UTF-8 filename > tempfilename
Error
iconv: Invalid codeset: UTF-8: The system cannot find the file specified.
iconv: Invalid codeset: UTF-16: The system cannot find the file specified.
when I issue iconv -l I am getting the follwing code set
Character sets: ISO8859-1:1987 8859 ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8
859-5 ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 CP037 EBCDIC CP273 CP277 CP278 CP2
80 CP284 CP285 CP297 CP437 CP500 CP850 CP852 CP857 CP860 CP863 CP865 CP866 CP870
CP871 CP905 ISO646 646 C
Thanks in advance
Regards
kris
You will have to do some reading:
A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX
This explains about using unicode in unix.