Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All,

I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes.

Sample Data :

"221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply of Delivery & Collection 1st Unit (1), Delivery & Collection Additions (1), Whitbread Refurb LCD (2)","Airwave Europe Ltd","15/04/2015","2520",""

Desired output :

"221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television  - 22 Refurbished - Airwave","Supply of Delivery & Collection 1st  Unit (1), Delivery & Collection Additions (1), Whitbread Refurb LCD  (2)","Airwave Europe Ltd","15/04/2015","2520",""

I have checked for many threads posted in this site. and tried

sed 's/\([^",]\)"\([^",]\)/\1\2/' < infile > outfile
perl -anle 'my @fields = ($_ =~ /(?:^|,)(".*?"|[^,]*?)(?=,|$)/g);foreach my $f(@fields){$f=~s/"//g;$f=sprintf("\"%s\"",$f);}my $line=join(",",@fields);print $line' file

But it didn't work. If the last column of the data is blank. then it is changing for that as well and getting the below output.

"221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television  - 22 Refurbished - Airwave","Supply of Delivery & Collection 1st  Unit (1), Delivery & Collection Additions (1), Whitbread Refurb LCD  (2)","Airwave Europe Ltd","15/04/2015","2520","

Could anyone help me out to fix this issue.

Regards,
Lavanya.

You sed snippet works for me - it removes exactly the " after the 22. Why don't you like that solution?

Change outer double quotes to an unprintable (here a literal ^B, so remember that not carat B, but ctrl-B). Then eliminate the remaining double quotes and replace all ^Bs with double quotes.

sed -e 's/^"/^B/' -e 's/"$/^B/' -e 's/",/^B,/g' -e 's/,"/,^B/g' -e 's/"//g' -e 's/^B/"/g'
1 Like

Hi RudiC,

It doesnt work for me as for the data should be some thing like below:

Input:

   560003_07.28,292.47,"D","1073/1220","44536370","16520","16520/14103000","Vacuum   - Upright (c) - "Vaclensa","Supply of BS36 Upright (3yr NO QUIBBLE   Guarantee) (1)","Vaclensa PLC","03/10/2014","2510","PINON15N001" 

After using

sed 's/\([^",]\)"\([^",]\)/\1\2/' < Input file> output_file

Output:

   560003_07.28,292.47,"D","1073/1220","44536370","16520","16520/14103000","Vacuum   - Upright (c) - Vaclensa","Supply of BS36 Upright (3yr NO QUIBBLE   Guarantee) (1)","Vaclensa PLC","03/10/2014","2510","PINON15N001 

Expected Ouptut:

   560003_07.28,292.47,"D","1073/1220","44536370","16520","16520/14103000","Vacuum   - Upright (c) - Vaclensa","Supply of BS36 Upright (3yr NO QUIBBLE   Guarantee) (1)","Vaclensa PLC","03/10/2014","2510","PINON15N001" 

Can u please help me out with this.

Regards,
Lavanya.

---------- Post updated at 01:04 AM ---------- Previous update was at 01:01 AM ----------

Hi Cjcox,

Can u please confirm if i can write the whole code you provided as a single SED command.
As im new to this technology and trying to learn .

Regards,
Lavanya.

Yes Lavanya, you can use a single invocation of the sed utility to perform all six sed substitute commands as shown in the suggestion cjcox provided.

Obviously, you will have to provide input for that command to process, and, unless you just want the output to go to standard output, you'll need to redirect the output.

Hi Don,

I tried using :

sed -e 's/^"/^B/' -e 's/"$/^B/' -e 's/",/^B,/g' -e 's/,"/,^B/g' -e 's/"//g' -e 's/^B/"/g'
but i could see that all the double quotes are replaced by
^B
and also eliminating the last double quote in the data file.

Input:

   221100,37.20,"C","0073/1454","44019120","16395","16395/14103000","Safety   Workwear - "Screwfix","","Screwfix Direct   Ltd","10/10/2014","2520","" 

Output:

   ^B221100^B,37.20,^BC^B,^B0073/1454^B,^B44019120^B,^B16395^B,^B16395/14103000^B,^BSafety   Workwear - Screwfix^B,^B^B,^BScrewfix Direct Ltd^B,^B10/10/2014^B,^B2520^B,^B 

Expected Output:

   221100,37.20,"C","0073/1454","44019120","16395","16395/14103000","Safety   Workwear - Screwfix","","Screwfix Direct   Ltd","10/10/2014","2520","" 

If you can see in the output , the last column with null value has been replaced only with a single double quote.

Please help me to resolve this issue.

Regards,
Lavanya.

---------- Post updated at 08:29 AM ---------- Previous update was at 08:15 AM ----------

Actually i have problem only with 8th column of the data file. Can we do the change only for column 8 , to check if there is any double quotes between double quotes and replace it with space.

Go back and look at message #3 in this thread again. You seem to have used the two characters circumflex ( ^ ) and capital letter b ( B ) instead of the single character that you get by pressing and holding the control key ( control , ctl , or cntl on your keyboard depending on your keyboard manufacturer) while you press and release the B key. This key combination would show up on your editing screen as ^B if you were using common UNIX/Linux/POSIX editing tools like vi .

If, for some reason, you are unable to use the ctl-B key combination to create that character, you can replace all occurrences of that character in the sed command line with any other character that CANNOT appear as a legitimate character in your input file except that you cannot use a character that has a special meaning in a basic regular expression nor that has a special meaning in a sed s command replacement string.