utility

hi experts,

Can you please help me out in removing delimiters with in double quotes from a CSV file.

input:

a,"bnn,",dgd, "sagfh,dj",ad

output

a,"bnn",dgd, "sagfhdj",ad

there are so mnay fileds in a row and there are millions of rows.

Thanks in an advance.
-subhendu-

if you have Perl, you can use Text::CSV module

#!/usr/bin/perl

use Text::CSV;
my $csv = Text::CSV->new({allow_loose_quotes=>1});
my $file = "file";
open (CSV, "<", $file) or die $!;
while (<CSV>) {
        if ($csv->parse($_)) {
            foreach my $col ($csv->fields()){
              print $col."\n";
            }
        } else {
            my $err = $csv->error_input;
            print "Failed to parse line: $err";
        }
    }
close(CSV);

output:

# ./test.pl
a
bnn,
dgd
 "sagfh
dj"
ad

An approach with awk:

awk -F "\"" '{
  for(i=2;i<NF;i+=2) {
    gsub(",", "", $i)
  }
}1' OFS="\"" file

Regards

the "," is gone

 # awk -F "\"" '{ for(i=2;i<NF;i+=2) {gsub(",", "", $i) }}1' OFS="\"" file
a,"bnn",dgd, "sagfhdj",ad

That's what the OP expected, check the desired output in the first post.

Regard

my bad, misinterpretation.

perl code:

my $str='a,"bnn,",dgd, "sagfh,dj",ad,"aa,bb,cc",asdf,asdf,"sdaf,asdfsdf,asdff",asdf';
print $str,"\n";
$str=~s/,
(?=
  (
    (?:[^"]*"[^"]*$)
   |
    (?:[^"]*"[^"]*(?:"[^"]*"[^"]*)*[^"]*$)
   )
)/|/xg;
print $str;

output:

a,"bnn,",dgd, "sagfh,dj",ad,"aa,bb,cc",asdf,asdf,"sdaf,asdfsdf,asdff",asdf
a,"bnn|",dgd, "sagfh|dj",ad,"aa|bb|cc",asdf,asdf,"sdaf|asdfsdf|asdff",asdf