Find Unicode Character in File

azelinsk · April 11, 2008, 5:40pm

I have a very large file in Unix that I would like to search for all instances of the unicode character 0x17. I need to remove these characters because the character is causing my SAX Parser to throw an exception. Does anyone know how to find a unicode character in a file?

Thank you for your assistance.

fpmurphy · April 11, 2008, 9:46pm

"0x17" is not a Unicode (UTF-16 or UTF-32) character per se.

For those not familiar with Unicode, UTF-16 basically means that
every "character" is stored as 2 bytes whereas UTF-32 means every
"character" is stored as 4 bytes.

On a practical level, it means that most standard ASCII characters are
either preceded by or followed by either a single NUL (0x00) or 3 NULs
depending on whether data storage is Big-Endian or Little-Endian.

Which Unicode "format" is your file using?