HOW: Dealing with new line character Wiindows-vs-UNIX

shekharjchandra · May 4, 2011, 6:14am

Hi
I am getting this similar below lines, by splitting one big file by using the code in my shell script

Line is:-

111111111|  +.00|12/11/04|12/11/05|n
222222222|  +.00|12/11/05|12/11/06|n
333333333|  +.00|12/11/06|10/11/07|n

Code is:-

 
...
...
...
nawk -F"|" -v v1="${v_pno}" -v v2="${v_nid}" -v v3="${v_ipath}" '{
fn=v3 "/AAA_" (length($1)==2?"BBBB":"CCCCC") "_" v1 "_" v2 ".txt"
print > fn
}' ${v_ipath}/${v_pno}_spoolfile.txt
if [ $? -eq 0 ]
then
  sed 's/[ ]*$/\n/' ${v_ipath}/AAA_CCCCC_${v_pno}_${v_nid}.txt > ${v_ipath}/dummy.txt
  mv ${v_ipath}/dummy.txt ${v_ipath}/AAA_CCCCC_${v_pno}_${v_nid}.txt
...
...
...

But I want in following format

 
111111111|  +.00|12/11/04|12/11/05|n
222222222|  +.00|12/11/05|12/11/06|n
333333333|  +.00|12/11/06|10/11/07|n

When I open the file in UNIX vi editor I am getting in required format i.e.

111111111|  +.00|12/11/04|12/11/05|n
222222222|  +.00|12/11/05|12/11/06|n
333333333|  +.00|12/11/06|10/11/07|n

But if the same file I open in Windows I am getting in this format
111111111| +.00|12/11/04|12/11/05|n^222222222| .00|12/11/05|12/11/06|n^
333333333| +.00|12/11/06|10/11/07|n

In the place of ^ character I am getting box type character (Which I am not able to show in this editor, so I replaced with ^ symbol)

Any other good way of getting is highly appreciable
Regards
jc

kevintse · May 4, 2011, 6:25am

You probably need unix2dos and dos2unix.
Windows uses 0x0d0a as newline, while Unix uses 0x0a.

shekharjchandra · May 4, 2011, 6:30am

Can anyone give some example on this unix2dos and dos2unix.

kevintse · May 4, 2011, 6:32am

# this command changes all 0x0a to 0x0d0a in your_file
unix2dos your_file

shekharjchandra · May 4, 2011, 6:56am

Hi Kevintse
I have tried in following way

$ --> unix2dos file1.txt > file2.txt
could not open /dev/kbd to get keyboard type US keyboard assumed
could not get keyboard type US keyboard assumed

My Regional Options are set to Englidh (United States)

Do you think is there anyway I can use this 0x0a to 0x0d0a in my sed command

Regards

---------- Post updated at 11:56 AM ---------- Previous update was at 11:46 AM ----------

I have checked doing ftp the same file by setting as ASCII and mget. When I opened the file in Windows Notepad it showed in required format.

But is there a way of doing something similar in code while using sed or any other (As I am not doing ftp)

kevintse · May 4, 2011, 9:26am

Yea, you can use sed:

sed 's/$/\r/' infile > dos.txt

BTW, when you use unix2dos, you don't do

unix2dos file1.txt > file2.txt

you simply do

unix2dos file1.txt

unix2dos converts your files in place.

shekharjchandra · May 4, 2011, 10:01am

Hi if my infile contains the data in following format

002010764| +.00|12/11/04|12/11/05|n^002010764| +.00|12/11/05|12/11/06|n^002010764| +.00|12/11/06|10/11/07|n^

Where ^ in red is a box type character
Then if I use your given command

 
sed 's/$/\r/' infile > dos.txt

I should get the rows in following format

002010764| +.00|12/11/04|12/11/05|n
002010764| +.00|12/11/05|12/11/06|n
002010764| +.00|12/11/06|10/11/07|n

When I tried this command
sed 's/$/\r/' infile > dos.txt

I got output something like this
002010764| +.00|12/11/04|12/11/05|nr^002010764| +.00|12/11/05|12/11/06|nr^002010764| +.00|12/11/06|10/11/07|nr^

^ is a box type character

PS:- The files generated from my shell script is placed in one of the folder in unix mount point. The splitted files are then accessed from Windows machine.
This is where when I open the splitted files I am getting this issue.

Also I tried the command

unix2dos file1.txt

Still the issue is same.

Thanks for looking into my problem.

kevintse · May 4, 2011, 10:43am

I have no idea what that box type character is, it doesn't look like a newline character though.

You can use dump or xxd and find the hex representation of that character:

head -1 your_file | your_file

Let's say the character is 0x09, you can use the following script to replace that character with dos newline:

sed 's/\009/\r\n/g' infile > outfile

Hope this helps.