Finding Special Character in Vi

Hi,

I have special characters in a file in unix which has many xml messages that comes from Messaging Queue. The loading process to the database failed due to special characters. Initially I could not able to detect it when I copy/paste in Windows editor as each line has more than 1000 characters.

So I sent the file as an email to outlook and opened it in Textpad and copied one line and pasted it in XML notepad. Then I could able to detect it.

I want to search special characters in vi editor in unix. My version of unix is AIX xxxxxxxxx 1 6 00F736154C00

For Example, lets say my file has the following content. How to find special characters that is not visible in the keyboard.

�sldfkjsd
sdlfkjsdlfk
lskdfjsldfj
slskdfjsldfk
flsdkfjsl�
sdlfkjsdlk
lsdkfjs
sldfksdjlfk�
sljfshl
lsjdfkdj
sldkfjsldk

You could search for non-word charaters if your version of vi supports is: /\W

/\W searches all "W" in the file. Not able to find special character or non-word characters.

Are you looking to remove these "special" characters or just find them? What codeset are you using? These "special" characters may actually be part of the codeset and be required.

Right now I just want to find them. I could able to find only one Character with right Arrow. But like to see if I have any other spl char. Since the each line has more than 2000 chars, hard to find them by naked eye.

Simplest tool is probably the cat utility if your version of cat has extensions such as -v -e -t.

After googling, I found out the following

Vi

/[^0-9a-zA-Z,_&-\/<>?=\"\':\\\. *]

Unix Shell

grep -n "[^0-9a-zA-Z,_&-\/<>?=\"\':\\\. *]" filename

Thanks for the contribution :cool:

Assuming that you're using a common US keyboard, the above expression will find several characters that are on the keyboard including, but not limited to {, }, |, [, and ] and you have a few unneeded backslashes. The following is a more complete vi search command:

/[^][[:space:]0-9a-zA-Z~!@#$%^&*()_+`={}|\\;':"<>?,./-]

Using the above search command on the output from the OS X command man 7 ascii only matches backspace characters. If you would also like to skip backspace and other control characters, you could use:

/[^][[:space:][:cntrl:]0-9a-zA-Z~!@#$%^&*()_+`={}|\\;':"<>?,./-]

The order of most of these characters doesn't matter, but the first three characters in this expression have to be [^] to start a non-matching expression that excludes ] and the - needs to be the last character before the closing ] to exclude the minus sign. (Your expression seems to have excluded - as an accident because in ASCII the range expression &-\/ (or equivalently &-/ ) in a non-matching expression excludes & , ' , ( , ) , * , + , , , - , . , and / and you also exclude several of these characters individually.) Of course, the [:space:] and [:cntrl:] have to remain as these sequences in order, but they can appear anywhere within the bracket expression.

3 Likes

@Don Cragun..
Slightly modified your code.
Added escape character for "`" and worked.

 grep -n "[^][[:space:]0-9a-zA-Z~!@#$%^&*()_+\`1={}|\\;':\"<>?,./-]"  filename

You search for negative range, like not tab plus not space through tilde: [^^I -~]
^I is tab 0x09 and ~ is 0x7E, so type that as [ ^ tab space - ~ ]

You can add any other specials that are OK in your context, like form feed ^L, carriage return ^M and backspace ^H. If they do not show on your screen, look at the line with ex command l: :.l

Yes. The search pattern I gave was just the one that works for vi. When you are passing the expression as an argument through a shell to grep (or some other utility) you have another level of quoting to worry about. With some shells you have to escape the backquote to avoid it being treated as the start of a command substitution inside a double quoted string. In other shells, that backslash wouldn't matter. With some shells, you might also have to escape the dollar sign. Note also that the "1" I marked in red above was a typo on my part. It won't hurt anything, but it isn't needed. I have also corrected it in my earlier message.

1 Like