I am working on a project that requires me to write a script that operates on a bunch of text files. When I try
less file.txt
I see a bunch of ^M's everywhere. Some Googling tells me that this is because the files have a DOS fileformat and found the following fixes:
sed 's/^M$//' file.txt > output.txt
or
dos2unix file.txt
Alas, neither of these approaches seems to work. When I open the file in gedit, everything is formatted appropriately and if I use gedit to save the file as file1.txt, the new file is fine when I do
less file1.txt
too. Is there anything I can do to fix the file from command line?
---------- Post updated at 03:05 AM ---------- Previous update was at 03:00 AM ----------
Sorry folks, I just realized that the files are Mac OS Classic file format and not DOS. I wrote a little awk script that fixes things now. I'm very sorry for the confusion. I'd appreciate it if a moderator can remove the thread.
That approach will add an extra blank line during the conversion, because after converting the last \r to \n, the awk print statement will add another \n.
If the text file is very large, the conversion will also require a large amount of memory, because awk reads until it finds a \n record separator (which is absent). With some awks, if there's a line length limit, even if you had sufficient memory, the attempt would fail.
In my opinion, the tr utility is tailor-made for this task. It's widely available, simple to use, will not add an extra blank line, and will not consume large amounts of memory even when dealing with monstrous files.
tr '\r' '\n'
If you insist on using awk, the following is a much better approach (it won't add the extra blank line nor slurp the entire file into memory unless the entire file is one \r-terminated line):
awk 1 RS='\r' ORS='\n'
My critique aside, thank you for reporting back with your solution. It's always good (and helpful for those who will search the forum in the future) to know how problems were resolved.