[Solved] CSV to CSV MS-DOS

shadyuk · March 14, 2013, 9:48am

Hello,

I have the following script which converts some data in a CSV file from one format to another. However, I have to convert the CSV (Macintosh) file to CSV or CSV (MS-DOS) first else the script wont work. I do this in Excel. I know it has something to do with the line endings, where CSV (Macintosh) saves with CR as the new-line character, and CSV (Comma delimited) and CSV (DOS) both use CR/LF.

Is there a way to add something to the top of my script that negates the need for me to convert manually?

#!/bin/ksh

if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk '
  NR==1{
    $0="header1,header2,headerN"
  }
  NR>1{
    for(i=2; i<=NF; i++) if($i~/^[KMG0-9]+\/[KMG0-9]+$/) sub("/",",",$i)
    $0=$0
    for(i=2; i<=NF; i++) if($i~/^[KMG0-9]+$/){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, "$1"

vbe · March 14, 2013, 9:53am

Check if you dont have ux2dos or unix2dos utilities before rewriting something... (and to revert dos2ux /dos2unix...)

shadyuk · March 14, 2013, 10:32am

Thanks. I tried unix2dos as follow:

unix2dos -437 -o Test.csv

This seemed to add the CR/LF new line character. However the script still fails. I still have to open in Excel and save as CSV or CSV(MS-DOS).

Sample input

232/344
21K7/5K48
5K32/4K85

Expected output

232,344
21700,5480
5320,4850

Corona688 · March 14, 2013, 11:19am

unix2dos does what it says, unix to dos. It doesn't turn / into ,.

Converting the / is easy though.

tr '/' ',' < inputfile > outputfile

shadyuk · March 14, 2013, 11:35am

Thanks Corona. That seems to work except that it also splits the date. How to amend so that it ignores column 2?

Corona688 · March 14, 2013, 11:46am

The example you showed doesn't have any dates in it.

Show representative input and representative output and we can try. Otherwise we'll blindly do things which do what you say, not what you actually want.

shadyuk · March 14, 2013, 11:58am

The data has many columns so the script above searches for any columns with format xxx/xxx and then converts according to the calculations. It also avoids column 2.

I hope the following is sufficient.

Sample input.

19347222,25/02/2013 06:46,1361803573,232/344
19347290,25/02/2013 06:46,1361803583,21K7/5K48
19347359,25/02/2013 06:46,1361803588,5K32/4K85

Expected output

19347222,25/02/2013 06:46,1361803573,232,344
19347290,25/02/2013 06:46,1361803583,21700,5480
19347359,25/02/2013 06:46,1361803588,5320,4850

It looks like that if I first replace "/" with "," with the tr utility, and then run the script, it works. However, its of course also splitting the date. So, I need to incorporate something into the script so that it replaces the delimiter first, avoids the date column and then runs the calculations on xxx/xxx.

Corona688 · March 14, 2013, 12:09pm

So it's basically CSV already, but the last column is a problem. OK. You can tell awk "substitute / for , in the very last column". You can also tell it 'print \r\n at the end of lines' so you don't need unix2dos after:

awk '{sub(/\//, ",", $NF)} 1' FS="," OFS="," ORS="\r\n" inputfile > outputfile

shadyuk · March 14, 2013, 12:18pm

Thanks Corona. But as I said, there's many columns (way too many to paste here) and several of them have the format xxx/xxx, not just the last column. It just needs to ignore column 2 and change the delimiter everywhere else.

Thanks

Corona688 · March 14, 2013, 12:27pm

Isn't that column 4?

awk '{sub(/\//, ",", $4)} 1' FS="," OFS="," ORS="\r\n" inputfile > outputfile

shadyuk · March 14, 2013, 12:38pm

Correct, column 4 in the sample input. However, it also applies to columns 8, 12, 16, 20, etc.

Corona688 · March 14, 2013, 12:41pm

In a regular pattern? Every 4 columns?

awk '{for(N=4; N<=NF; N+=4) sub(/\//, ",", $N) } 1' FS="," OFS="," ORS="\r\n" inputfile > outputfile

shadyuk · March 15, 2013, 8:06am

Correct.

---------- Post updated 03-15-13 at 07:06 AM ---------- Previous update was 03-14-13 at 11:50 AM ----------

Thanks, works just fine.