Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
This is just a test file i was working on but the actual file contains numbers of rows and columns. Am i missing anything above up here?
The requirements are for the output should be as below
- UTF-16 Little endian
- preceded with a byte order marker --ff and fe
- Windows line endings
Any pointers will be great
Thanks,
P
Usually all the file transferred should be binary format so that nothing can be changed so that we don�t get any unknown characters
Let us consider the file name as Orgdata_UTF8.txt then output file as Orgdata.txt
unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>Orgdata.txt
As some systems add the BOM by default and some systems doesn�t add the BOM based on the operating systems and it is also the similar case with UTF-16LE format as it is sometimes recognised as UTF-16 and some as UTF-16LE based on versions and use them as needed.
Adding BOM manually
Create a new file as below Orgdata.txt and check the file type using file command to confirm that it is UTF-16LE format and then convert it as below
printf "\xff\xfe" > Orgdata.txt
file Orgdata.txt
unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>>Orgdata.txt
Use the hex coder to check if you have got the desired result of ff fe or not. This result varies depending on the type of hexdump used.
cat Orgdata.txt |hexdump |less -- this shows as fe ff
xxd < Orgdata.txt |less -- same file shows as ff fe
cat -vT Orgdata.txt
In reality both of them are same as one of them shows the reversing output.
This has resolved my issue
iconv and BOMs are a gray area in the Unicode specification. A useful discussion regarding iconv and presence or lack of a BOM is here
Thanks for the reply fpmurphy. I have already gone thru that link in getting the desired result.
Anyway thanks for the help.
Thanks,
P