grep high bit char

Hi -

I have file which contains high bit unicode chars like � etc.. How can I do grep to find out lines which contain copyright symbol �

I tried using

grep \x{00A9}
grep \x\{00A9\}

Thanks-

Any suggestion ?

I need to use grep only..

Try this:

grep '�' filename

How you will type '�' in unix ??? I am not sure whether you can type it in unix...

In windows I can type it using 'Alt+0169'..

'�' in Unix is:

Press Shift+Alt+0 simultaneously.

Thanks for your reply.

However, I am not able to type � in unix

I tried shift+alt+0...

POSIX grep does not look past a nul character. 00A9 is the unicode sequence number for what you want. The first byte is 00 - the nul character.

grep will not do what you need. Cosnider wiritng something in C - reads in short integers (2 byte integers) from the file. Compare each one with 169. When you find 169 that is character offset in the file where the symbol is.

You are probably better off using a Windows editor.

Found a version og grep from mkssoftware that claims to support unicode:
grep, egrep, fgrep -- match patterns in a file

GNU grep has a -U switch to support binary character files (UTF-16, unicode, etc)

Thanks Jim.

I will try to see what I can do to implement it...