Convert directory of text files to Unix/Linux Line Ending

chipperuga · August 18, 2011, 10:28am

I need help converting a directory of *.txt with Windows line ending to UTF-8 character encoding and Unix/Linux line ending.

vbe · August 18, 2011, 10:34am

Usually, you would use tools like dos2ux or dos2unix, do you have any (name may differ according to OS...)

chipperuga · August 18, 2011, 10:41am

Available in a repository?

---------- Post updated at 10:41 AM ---------- Previous update was at 10:38 AM ----------

Thanks for the quick response, @vbe.

jville · August 18, 2011, 10:41am

that a look here Removing control-Ms (^M)

vbe · August 18, 2011, 10:57am

Miscellaneous software tools (need to be compiled...)

chipperuga · August 18, 2011, 4:15pm

Not having much luck with dos2unix. How would I change all files in a directory from UNICODE to UTF-8?

Can't get this to work...

dos2unix -f UNICODE -t UTF-8 *

alister · August 18, 2011, 5:58pm

That question doesn't really make much sense. Unicode is not an encoding; it is a character set. UTF-8 is one of many ways to encode the Unicode character set. This means you have yet to state the source's encoding.

In any case, you probably want to take a look at iconv for the transcoding aspect of the task.

Regards,
Alister

Corona688 · August 18, 2011, 6:29pm

If they're Unicode text files from Windows, they're probably UTF-16.

chipperuga · August 18, 2011, 6:30pm

Interesting. When I open Notepad in Windows and click on save as, there are 4 options for encoding: ANSI, Unicode, Unicode big endian, UTF-8.

The logs I am referring to are encoded, at least according to Notepad, in Unicode. I'm able to change them to UTF-8 and they look right with the cat command in cygwin.

Corona688 · August 18, 2011, 6:31pm

Crosspost.

What Windows calls "Unicode" is UTF-16.