extra character with iconv encoding

hey,

I am trying to convert a sample russian encoding file to English encoding using iconv utility.

Its almost done but with each converted character i am getting one extra character which must not come.

my sample Russian text is

test.txt

           ~

and script which i am using for conversion is

script

>out
for i in `iconv -l`
do 
o=`iconv -f cp866 -t $i test.txt` 
len=`expr length "$o"`
if [ "$len" -gt 2 ]
then
echo $o#$i>>out
fi
done

and sample output for few almost successfully converted text are:

out

@ A B C D E G H I J K ~	CP932
@ A B C D E G H I J K ~	CSIBM932
@ A B C D E G H I J K ~	CSIBM943
@ A B C D E G H I J K ~	CSSHIFTJIS
@ A B C D E G H I J K ~	CSWINDOWS31J
@ A B C D E G H I J K ~	IBM-932
@ A B C D E G H I J K ~	IBM-943
@ A B C D E G H I J K ~	IBM932
@ A B C D E G H I J K ~	IBM943
@ A B C D E G H I J K ~	MS932
@ A B C D E G H I J K ~	MS_KANJI
@ A B C D E G H I J K ~	SHIFT-JIS
@ A B C D E G H I J K ~	SHIFT_JIS
@ A B C D E G H I J K ~	SHIFT_JISX0213
@ A B C D E G H I J K ~	SJIS-OPEN
@ A B C D E G H I J K ~	SJIS-WIN
@ A B C D E G H I J K ~	SJIS
@ A B C D E G H I J K ~	WINDOWS-31J

pls suggest where i am going wrong in this encoding process

Any help with that would be greatly appreciated.

---------- Post updated 06-14-11 at 08:08 AM ---------- Previous update was 06-13-11 at 09:20 PM ----------

hey guys can anyone help me on this..

What is the output of

iconv -f CP866 -t UTF-8 test.txt

It's alright. Change coding on your terminal or in your editor to shift_jis and you can see "pure" Cyrillic letters. Sometimes you can see (like so ) - it's the leading symbol for Cyrillic (and some another) letters.

hi Yazu,

I am new to this unix part, can u please explain in detail the steps to perform same

  • extra characters i am getting is in converted english file which is in default ascii mode.
  • I am using putty with encoding setting as utf-8

---------- Post updated at 04:08 PM ---------- Previous update was at 11:43 AM ----------

hey can anyone help me on this..

Out of curiosity, why did you recommend changing to Shift-JS which is a Japanese language encoding? CP866 does not map to Shift-JS. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters in JIS X 0201.