Removing comma "," in a field value in csv file

Hi,

I have csv file with records as below. Now i have remove any comma in the filed value because that creates problem when i feed this file to an application.

for example below are two sample records, the second record have a comma in "Salesforce.com, Inc." field, now i have to remove this comma. Please advise a method to do this.

"20110922","WSJ WE","A001","1NA",148,0,0,148,0,"United States","PG01","ORACLE","Exadata","GENL","B160","ZENITH",291001168,"SECA","GENL"

"20110919","WSJ EE","A001","1NA",148,0,0,148,0,"United States","PG01","Salesforce.com, Inc.","Salesforce chatter","GENL","B241","Salesforce.com, Inc.",291008167,"SECA","GENL"

Thanks,
Naga

 
nawk '{sub("com,","com",$0);print}' input.csv
$ sed 's,.com\, ,.com ,g' infile
"20110919","WSJ EE","A001","1NA",148,0,0,148,0,"United States","PG01","Salesforce.com Inc.","Salesforce chatter","GENL","B241","Salesforce.com Inc.",291008167,"SECA","GENL"

how to make it generic? the comma may not precede ".com" always. It can be in any field.

With GNU awk 4 (removes the comma from any field):

awk '{ 
  for (i = 0; ++i <= NF; )
   sub(/,/, x, $i)  
  }1' OFS=, FPAT='([^,]+)|("[^"]+")' infile

With Perl:

perl -MText::ParseWords -nle'
   @f = parse_line(",",2, $_);
   tr/,//d for @f;
   print join ",", @f
  ' infile 

@radoulov

awk dint work and perl gave the below error

syntax error at -e line 1, near "tr/,//d for "
Execution of -e aborted due to compilation errors.

Do you have GNU awk 4?

Which shell are you using? Try running the command with a shell different than the (t)csh.
Could you please post the exact command and the complete output that you get?

am using ksh.

perl -MText::ParseWords -nle ' @f = parse_line(",",2, $_);    tr/,//d for @f;    print join ",", @f  ' tmp.csv

here is the error

cmrwc7:/wcdev/home/db_maint> perl -MText::ParseWords -nle'   @f = parse_line

>    @f = parse_line(",",2, $_);   tr/,//d for @f;

   print join ",", @f

  ' tmp.csv

>    tr/,//d for @f;  ' tmp.csv

>    print join ",", @f  ' tmp.csv

>   ' tmp.csv

syntax error at -e line 3, near "tr/,//d for "

Execution of -e aborted due to compilation errors.

Try the following:

  1. Create a new file with the following content:
#!/usr/bin/perl -w

use strict;
use Text::ParseWords;

while (<>) {
  my @f = parse_line(',', 2, $_);
  tr/,//d for @f;
  print join ',', @f;
  }

Assign the proper permissions:

chmod u+x your_new_file

Execute the script like this:

./your_new_file tmp.csv

Another one using awk

awk -F"," '{for(i=1;i<=NF;i++){t=gsub(/"/,"\"",$i);if(t==1){$i=$i $(++i)}printf $i OFS;if(i==NF)print "\n"}}' 
OFS="," input_file | sed 's/,$//g'

--ahamed

---------- Post updated at 05:28 AM ---------- Previous update was at 05:26 AM ----------

well, this will take care of only 1 comma if it occurs within "...".

--ahamed

Another approach could be:

awk -F\" '{for(i=2;i<=NF;i+=2)gsub(",",x,$i)}1' OFS=\" file > newfile
1 Like
sed 's/",/&_X_/g;s/\("[^ ]*\),\( [^ ]*"\),_X_/\1\2,/g;s/_X_//g' infile

Yet another way with [n]awk...

awk -F, '{
    for (i=1; i<=NF; i++) {
        if (s) {
           if ($i ~ /^[^"].*\"$/) {printf("%s,", s $i); s=0}
           else s = s $i
        }
        else if ($i ~ /^["].*[^"]$/) s = $i
        else printf("%s%s", $i, i==NF?"\n":FS)
    }
}' file

The following approach handles any number of embedded commas (even when the comma-delimited, double-quoted field contains both commas and escaped quotes). It assumes that the input does not contain any bytes with value of 0x01 (usually a safe assumption).

tr '\n,' '\001\n' < d | 
awk -F\" '{while ((NF-1)%2) {o=$0; getline; $0=o$0}; printf("%s%s", r, $0); r=ORS}' |
tr '\n\001' ',\n'

Basic test:

$ cat d
0
"A,A"
0,"A,A",0
" ,,"",,"", A,,,,,A ,"" A,,,,,",0,0
$ tr '\n,' '\001\n' < d |
> awk -F\" '{while ((NF-1)%2) {o=$0; getline; $0=o$0}; printf("%s%s", f, $0); f=ORS}' | tr '\n\001' ',\n' 
0
"AA"
0,"AA",0
" """" AA "" A",0,0

Regards,
Alister

---------- Post updated at 01:20 PM ---------- Previous update was at 12:42 PM ----------

Nice. Your approach can handle the same situations as mine but is much more concise and efficient. Looking back, I'm not certain how I managed to arrive at such a needlessly complex solution.

It's an obvious, minor typo, but in case anyone trying to use this code experiences an infinite loop, note that the variable c should be i.

Regards,
Alister

1 Like