Removing comma "," in a field value in csv file

anaga · September 29, 2011, 6:14am

Hi,

I have csv file with records as below. Now i have remove any comma in the filed value because that creates problem when i feed this file to an application.

for example below are two sample records, the second record have a comma in "Salesforce.com, Inc." field, now i have to remove this comma. Please advise a method to do this.

"20110922","WSJ WE","A001","1NA",148,0,0,148,0,"United States","PG01","ORACLE","Exadata","GENL","B160","ZENITH",291001168,"SECA","GENL"

"20110919","WSJ EE","A001","1NA",148,0,0,148,0,"United States","PG01","Salesforce.com, Inc.","Salesforce chatter","GENL","B241","Salesforce.com, Inc.",291008167,"SECA","GENL"

Thanks,
Naga

itkamaraj · September 29, 2011, 6:21am

 
nawk '{sub("com,","com",$0);print}' input.csv

jayan_jay · September 29, 2011, 6:30am

$ sed 's,.com\, ,.com ,g' infile
"20110919","WSJ EE","A001","1NA",148,0,0,148,0,"United States","PG01","Salesforce.com Inc.","Salesforce chatter","GENL","B241","Salesforce.com Inc.",291008167,"SECA","GENL"

anaga · September 29, 2011, 6:34am

how to make it generic? the comma may not precede ".com" always. It can be in any field.

radoulov · September 29, 2011, 6:40am

With GNU awk 4 (removes the comma from any field):

awk '{ 
  for (i = 0; ++i <= NF; )
   sub(/,/, x, $i)  
  }1' OFS=, FPAT='([^,]+)|("[^"]+")' infile

With Perl:

perl -MText::ParseWords -nle'
   @f = parse_line(",",2, $_);
   tr/,//d for @f;
   print join ",", @f
  ' infile

anaga · September 29, 2011, 6:54am

@radoulov

awk dint work and perl gave the below error

syntax error at -e line 1, near "tr/,//d for "
Execution of -e aborted due to compilation errors.

radoulov · September 29, 2011, 6:56am

Do you have GNU awk 4?

Which shell are you using? Try running the command with a shell different than the (t)csh.
Could you please post the exact command and the complete output that you get?

anaga · September 29, 2011, 7:23am

am using ksh.

perl -MText::ParseWords -nle ' @f = parse_line(",",2, $_);    tr/,//d for @f;    print join ",", @f  ' tmp.csv

here is the error

cmrwc7:/wcdev/home/db_maint> perl -MText::ParseWords -nle'   @f = parse_line

>    @f = parse_line(",",2, $_);   tr/,//d for @f;

   print join ",", @f

  ' tmp.csv

>    tr/,//d for @f;  ' tmp.csv

>    print join ",", @f  ' tmp.csv

>   ' tmp.csv

syntax error at -e line 3, near "tr/,//d for "

Execution of -e aborted due to compilation errors.

radoulov · September 29, 2011, 7:31am

Try the following:

Create a new file with the following content:

#!/usr/bin/perl -w

use strict;
use Text::ParseWords;

while (<>) {
  my @f = parse_line(',', 2, $_);
  tr/,//d for @f;
  print join ',', @f;
  }

Assign the proper permissions:

chmod u+x your_new_file

Execute the script like this:

./your_new_file tmp.csv

ahamed101 · September 29, 2011, 8:28am

Another one using awk

awk -F"," '{for(i=1;i<=NF;i++){t=gsub(/"/,"\"",$i);if(t==1){$i=$i $(++i)}printf $i OFS;if(i==NF)print "\n"}}' 
OFS="," input_file | sed 's/,$//g'

--ahamed

---------- Post updated at 05:28 AM ---------- Previous update was at 05:26 AM ----------

well, this will take care of only 1 comma if it occurs within "...".

--ahamed

Franklin52 · September 29, 2011, 8:41am

Another approach could be:

awk -F\" '{for(i=2;i<=NF;i+=2)gsub(",",x,$i)}1' OFS=\" file > newfile

ygemici · September 29, 2011, 10:40am

sed 's/",/&_X_/g;s/\("[^ ]*\),\( [^ ]*"\),_X_/\1\2,/g;s/_X_//g' infile

shamrock · September 29, 2011, 11:41am

Yet another way with [n]awk...

awk -F, '{
    for (i=1; i<=NF; i++) {
        if (s) {
           if ($i ~ /^[^"].*\"$/) {printf("%s,", s $i); s=0}
           else s = s $i
        }
        else if ($i ~ /^["].*[^"]$/) s = $i
        else printf("%s%s", $i, i==NF?"\n":FS)
    }
}' file

alister · September 29, 2011, 1:20pm

The following approach handles any number of embedded commas (even when the comma-delimited, double-quoted field contains both commas and escaped quotes). It assumes that the input does not contain any bytes with value of 0x01 (usually a safe assumption).

tr '\n,' '\001\n' < d | 
awk -F\" '{while ((NF-1)%2) {o=$0; getline; $0=o$0}; printf("%s%s", r, $0); r=ORS}' |
tr '\n\001' ',\n'

Basic test:

$ cat d
0
"A,A"
0,"A,A",0
" ,,"",,"", A,,,,,A ,"" A,,,,,",0,0
$ tr '\n,' '\001\n' < d |
> awk -F\" '{while ((NF-1)%2) {o=$0; getline; $0=o$0}; printf("%s%s", f, $0); f=ORS}' | tr '\n\001' ',\n' 
0
"AA"
0,"AA",0
" """" AA "" A",0,0

Regards,
Alister

---------- Post updated at 01:20 PM ---------- Previous update was at 12:42 PM ----------

Nice. Your approach can handle the same situations as mine but is much more concise and efficient. Looking back, I'm not certain how I managed to arrive at such a needlessly complex solution.

It's an obvious, minor typo, but in case anyone trying to use this code experiences an infinite loop, note that the variable c should be i.

Regards,
Alister