I want to use a script (preferably awk) which determines if the first character in a line is double-byte (as in Japanese or Chinese) and deletes it.
For example:
(in the above quote, I see Japanese on my screen for two lines - with 2 characters in the first and 3 characters in the second - you may see random symbols)
Thanks daptal - but that's not what I need. I need exactly as stated - only detecting lines with a double byte character only in the beginning position.
I am the one who wrote the file, so I know where the character sets came from.
Not sure I understand your question though. The non-Japanese characters are all single-byte characters (I am using vim). The Japanese characters use the "Double Byte Character Set (DBCS).
I want to keep it general so that Chinese and Korean characters are also recognized - which should work by detecting DBCS characters. There must be a straightforward way ... ?
The string "\200" represents a single character with octal value 200, which in binary is 10000000, i.e. the most significant bit is set to 1.
So, the supplied awk code prints lines where the first character's most significant bit is not set.