I need help converting a directory of *.txt with Windows line ending to UTF-8 character encoding and Unix/Linux line ending.
Usually, you would use tools like dos2ux or dos2unix, do you have any (name may differ according to OS...)
Available in a repository?
---------- Post updated at 10:41 AM ---------- Previous update was at 10:38 AM ----------
Thanks for the quick response, @vbe.
Not having much luck with dos2unix. How would I change all files in a directory from UNICODE to UTF-8?
Can't get this to work...
dos2unix -f UNICODE -t UTF-8 *
That question doesn't really make much sense. Unicode is not an encoding; it is a character set. UTF-8 is one of many ways to encode the Unicode character set. This means you have yet to state the source's encoding.
In any case, you probably want to take a look at iconv for the transcoding aspect of the task.
Regards,
Alister
If they're Unicode text files from Windows, they're probably UTF-16.
Interesting. When I open Notepad in Windows and click on save as, there are 4 options for encoding: ANSI, Unicode, Unicode big endian, UTF-8.
The logs I am referring to are encoded, at least according to Notepad, in Unicode. I'm able to change them to UTF-8 and they look right with the cat command in cygwin.
Crosspost.
What Windows calls "Unicode" is UTF-16.