Converting DOS filetype to UNIX

ksk · July 14, 2012, 5:35pm

Hello folks

I am working on a project that requires me to write a script that operates on a bunch of text files. When I try

less file.txt

I see a bunch of ^M's everywhere. Some Googling tells me that this is because the files have a DOS fileformat and found the following fixes:

sed 's/^M$//' file.txt > output.txt

or

dos2unix file.txt

Alas, neither of these approaches seems to work. When I open the file in gedit, everything is formatted appropriately and if I use gedit to save the file as file1.txt, the new file is fine when I do

less file1.txt

too. Is there anything I can do to fix the file from command line?

---------- Post updated at 03:05 AM ---------- Previous update was at 03:00 AM ----------

Sorry folks, I just realized that the files are Mac OS Classic file format and not DOS. I wrote a little awk script that fixes things now. I'm very sorry for the confusion. I'd appreciate it if a moderator can remove the thread.

Thanks again

methyl · July 14, 2012, 6:11pm

Just for completeness if someone Googles the thread.
Had the O/S been any unix/Linux except early MACOS.

The quick way (delete the carriage-return characters using the correct method):

cat msdos_file.txt | tr -d '\r' > newfilename.txt

Note about dos2unix / dos2ux . These are not in-situ editors.

cat filename.txt | unix2dos > newfilename.txt

mregine · July 15, 2012, 11:24am

More completeness: Try "recode" to convert between a wide variety of encodings. Can be installed using macports on the Mac.

$ recode -l

lists all encodings it knows about

$ recode old..new file

converts between "old" encoding and "new" one.

ksk · July 15, 2012, 1:37pm

Thanks to both of you. In addition, my way of doing it (from ARCHIVED: How do I convert between Unix and Mac OS or Mac OS X text files? - Knowledge Base) is

   awk '{ gsub("\r", "\n"); print $0;}' macfile.txt > unixfile.txt

to go from Mac OS to Unix/Linux.

alister · July 15, 2012, 4:40pm

ksk:

Thanks to both of you. In addition, my way of doing it (from ARCHIVED: How do I convert between Unix and Mac OS or Mac OS X text files? - Knowledge Base) is
   awk '{ gsub("\r", "\n"); print $0;}' macfile.txt > unixfile.txt 
to go from Mac OS to Unix/Linux.

That is not a very good solution.

That approach will add an extra blank line during the conversion, because after converting the last \r to \n, the awk print statement will add another \n.

If the text file is very large, the conversion will also require a large amount of memory, because awk reads until it finds a \n record separator (which is absent). With some awks, if there's a line length limit, even if you had sufficient memory, the attempt would fail.

In my opinion, the tr utility is tailor-made for this task. It's widely available, simple to use, will not add an extra blank line, and will not consume large amounts of memory even when dealing with monstrous files.

tr '\r' '\n'

If you insist on using awk, the following is a much better approach (it won't add the extra blank line nor slurp the entire file into memory unless the entire file is one \r-terminated line):

awk 1 RS='\r' ORS='\n'

My critique aside, thank you for reporting back with your solution. It's always good (and helpful for those who will search the forum in the future) to know how problems were resolved.

Regards,
Alister

ksk · July 15, 2012, 4:44pm

Fair enough Alister Thanks, I'll use tr from now on