Help to Convert file from UNIX UTF-8 to Windows UTF-16

phanidhar6039 · January 30, 2014, 8:20am

Hi,

I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine

unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt

and i am getting some chinese characters as below which l opened the converted file on windows machine.

LANG=en_US.UTF-8

LC_NUMERIC="en_US.UTF-8"

LC_COLLATE="en_US.UTF-8"

LC_MESSAGES="en_US.UTF-8"

LC_NAME="en_US.UTF-8"

LC_TELEPHONE="en_US.UTF-8"

LC_IDENTIFICATION="en_US.UTF-8"

This is just a test file i was working on but the actual file contains numbers of rows and columns. Am i missing anything above up here?

The requirements are for the output should be as below

UTF-16 Little endian
preceded with a byte order marker --ff and fe
Windows line endings

Any pointers will be great

Thanks,
P

phanidhar6039 · February 4, 2014, 6:00am

Usually all the file transferred should be binary format so that nothing can be changed so that we don�t get any unknown characters

Let us consider the file name as Orgdata_UTF8.txt then output file as Orgdata.txt

unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>Orgdata.txt

As some systems add the BOM by default and some systems doesn�t add the BOM based on the operating systems and it is also the similar case with UTF-16LE format as it is sometimes recognised as UTF-16 and some as UTF-16LE based on versions and use them as needed.

Adding BOM manually

Create a new file as below Orgdata.txt and check the file type using file command to confirm that it is UTF-16LE format and then convert it as below

printf "\xff\xfe" > Orgdata.txt
file Orgdata.txt 
unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>>Orgdata.txt

Use the hex coder to check if you have got the desired result of ff fe or not. This result varies depending on the type of hexdump used.

cat Orgdata.txt |hexdump |less  -- this shows as fe ff          
xxd < Orgdata.txt |less              -- same file shows as ff fe

cat -vT Orgdata.txt

In reality both of them are same as one of them shows the reversing output.

This has resolved my issue

fpmurphy · February 4, 2014, 9:51am

iconv and BOMs are a gray area in the Unicode specification. A useful discussion regarding iconv and presence or lack of a BOM is here

phanidhar6039 · February 4, 2014, 10:20am

Thanks for the reply fpmurphy. I have already gone thru that link in getting the desired result.

Anyway thanks for the help.

Thanks,
P