Converting comma separated to pipe delimited file

Hi,

I came across a very good script to convert a comma seperated to pipe delimited file in this forum. the script serves most of the requirement but looks like it does not handle embedded double quotes and commas i.e if the input is like

 
1234, "value","first,second", "LDC5"monitor", "three""","four" 

The desired output should be

1234|value|first,second|LDC5"monitor|three""|four

but the output is like

1234|value|first|second|LDC5monitor|three"|four"

please suggest if it possible to modify the code such that it handles comma and double quotes in the substring.

BEGIN { FS=SUBSEP; OFS="|" }

{
  result = setcsv($0, ",")
  print
}

function setcsv(str, sep, i) {
  gsub(/""/, "\035", str)
  gsub(sep, FS, str)

  while (match(str, /"[^"]*"/)) {
    middle = substr(str, RSTART+1, RLENGTH-2)
    gsub(FS, sep, middle)
    str = sprintf("%.*s%s%s", RSTART-1, str, middle,
      substr(str, RSTART+RLENGTH))
  }
  if (index(str, "\"")) {
    return ((getline) > 0) ? setcsv(str (RT != "" ? RT : RS) $0, sep) : !setcsv(str "\"", sep)
  } else {
    gsub(/\035/, "\"", str)
    $0 = str

    for (i = 1; i <= NF; i++)
      if (match($i, /^"+$/))
        $i = substr($i, 2)

    $1 = $1 ""
    return 1
  }
}

Something like this?

awk -F, '{gsub(",", "|"); gsub("\"", "")}1' OFS="|" file

this will replace the comma in the substring with pipe

i.e 1234,"test,file" will be replaced as 1234|test|file

There should be a better way in sed but this will do what you want...

sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' infile

---------- Post updated at 01:52 PM ---------- Previous update was at 01:29 PM ----------

Or awk...

awk '{gsub(" ","");gsub("\"$", "");gsub("\",\"", "|");gsub(",\"","|")}1' infile
1 Like

thanks for the updates.. But this doesn seem to work.. Please try the solution with the below data.

571283,1,"R","01/15/2002","IBMS,SL","IBM/POSSL5M7"","000019826","000019826",,,571283,"D","D","N","N","N","N","N","N","N","N",1,1,1808946.09,1808946.09,,,1808946.09,1808946.09,,,2,2,2,"USD","USD","08/01/1987","08/01/1987",200,200,"1098","1098",,,"12006","12006",,,"BASIC","BASIC","1098","1098",,,,,,,200,"04/12/2002",17:18:18,"finl421",,,"P",,,,,

This differs from the sample file you provided.
so what is your desired output for this example?

With Perl and the CPAN module Text:CSV:

perl -MText::CSV -nle'BEGIN {
  $csv = Text::CSV->new();
    }  
  $csv->parse($_) and 
    print join "|", $csv->fields();
  ' infile
% cat infile
571283,1,"R","01/15/2002","IBMS,SL","IBM/POSSL5M7"","000019826","000019826",,,571283,"D","D","N","N","N","N","N","N","N","N",1,1,1808946.09,1808946.09,,,1808946.09,1808946.09,,,2,2,2,"USD","USD","08/01/1987","08/01/1987",200,200,"1098","1098",,,"12006","12006",,,"BASIC","BASIC","1098","1098",,,,,,,200,"04/12/2002",17:18:18,"finl421",,,"P",,,,,

% perl -MText::CSV -nle'BEGIN {
  $csv = Text::CSV->new();
    }
  $csv->parse($_) and
    print join "|", $csv->fields();
  ' infile                         
571283|1|R|01/15/2002|IBMS,SL|IBM/POSSL5M7",00019826|000019826|||571283|D|D|N|N|N|N|N|N|N|N|1|1|1808946.09|1808946.09|||1808946.09|1808946.09|||2|2|2|USD|USD|08/01/1987|08/01/1987|200|200|1098|1098|||12006|12006|||BASIC|BASIC|1098|1098|||||||200|04/12/2002|17:18:18|finl421|||P|||||

Text::CSV is not included in the standard Perl distribution.

---------- Post updated at 03:01 PM ---------- Previous update was at 02:41 PM ----------

Actually I'm not sure if the output matches the OP expectations ...

---------- Post updated at 03:27 PM ---------- Previous update was at 03:01 PM ----------

I suppose this code produces the desired result:

perl -MText::CSV -nle'BEGIN {
  $csv = Text::CSV->new({
    escape_char          => "\\"
      });
    }  
  $csv->parse($_) and 
    print join "|", $csv->fields();
  ' infile
% perl -MText::CSV -nle'BEGIN {
  $csv = Text::CSV->new({
    escape_char          => "\\"
      });
    }
  $csv->parse($_) and
    print join "|", $csv->fields();
  ' infile
571283|1|R|01/15/2002|IBMS,SL|IBM/POSSL5M7"|000019826|000019826|||571283|D|D|N|N|N|N|N|N|N|N|1|1|1808946.09|1808946.09|||1808946.09|1808946.09|||2|2|2|USD|USD|08/01/1987|08/01/1987|200|200|1098|1098|||12006|12006|||BASIC|BASIC|1098|1098|||||||200|04/12/2002|17:18:18|finl421|||P|||||

Try out this one
if it could solve the problem

awk '{gsub(/\",\"/,"|",$0);gsub(/,\"/,"|",$0);gsub(/\",/,"|",$0)}1'

Hi,

radoulov perl script was exactly the one which i was looking for.
But when i tried running from my AIX unix box, im getting some error
cpan file is loacted in the below path

 
/usr/opt/perl5/bin>ls -l cpan
-rwxr-xr-x    1 root     system         3850 Feb 16 2004  cpan

Error:

Can't locate Text/CSV.pm in @INC (@INC contains: /usr/opt/perl5/lib/5.8.2/aix-thread-multi /usr/opt/perl5/lib/5.8.2 /usr/opt/perl5/lib/site_perl/5.8.2/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.8.2 /usr/opt/perl5/lib/site_perl .).
BEGIN failed--compilation aborted.

Kindly let me know wher to check for Text/CSV.pm

Afaik you first need to install that missing module. One way would be to install it through the cpan shell:

# cpan
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.9402)
Enter 'h' for help.

cpan[1]> install Text::CSV

From my unix shell how do i check whether Text::CSV is installed or not.?

Use below code:

sed -e 's/ //g' -e 's/,"/|/g' -e 's/"//g' file

You've already checked that while running Radoulov's code, but you can check that be running this alone:

perl -MText::CSV -e1

http://jclark.org/weblog/2004/09/14/moduletips/

sed -e 's/ //g' -e 's/,"/|/g' -e 's/"//g' file

RahulJoshi,

ur code will not handle embedded quotes and commas and files with no data (i.e "hi",,,"hello")

As already stated, most probably, you don't have that module installed.
Consider that you can install it in a custom location, if you don't have root access.