sed delete leading spaces in a .csv if not in a string

gary_w · July 6, 2012, 10:54am

Solaris, ksh
I have a .csv file I am trying to clean up before loading into the database. The file contains comma separated columns that have leading spaces which I need to remove. The trouble is, some columns that should not be touched are strings which happen to have the same pattern in them. How do I search/replace a comma-space pattern if NOT in a string delimited by double-quotes?

Input row (null first column):

,12345,"first last, MD",Yes,<space>not in use,<space>Other

Desired output:

,12345,"first last, MD",Yes,not in use,Other

The comma-space in the third field (name) should be left alone, but the other columns should have the leading space removed.

This command of course affects the name field which I want to leave alone.

sed 's/, /,/g' x.dat

Note: the third column will be the only column that could be a string. If sed could only operate on columns > 3, that approach would work too.

Thanks for any advice!

Corona688 · July 6, 2012, 11:07am

Your columns are not comma-separated.

You can kludge awk to separate on " and then replace inside...

elixir_sinari · July 6, 2012, 11:52am

Using these assumptions, the following should work:

sed 'h;s/^\(.*"[^"]*"\).*/\1/;x;s/^\(.*"[^"]*"\)\(.*\)/\2/;s/, /,/g;x;G;s/\n//' inputfile

EDIT: I've assumed that the 3rd field has " and ".

Scrutinizer · July 6, 2012, 12:21pm

Like Corona also suggests, I would use " . You could then convert to semicolon for example

awk 'NR%2{gsub(/,/,";")}1' RS=\" ORS= file

and take it from there, for example:

$ awk 'NR%2{gsub(/,/,";")} {gsub(/; */,";")}1' RS=\" ORS= file
;12345;first last, MD;Yes;not in use;Other
$

gary_w · July 6, 2012, 1:57pm

Thank you all for your suggestions. I didn't make it clear enough that the third field does not necessarily have quotes.
At least this is a one-time task. I have taken steps to ensure the source trims before giving me the data the next time!