To remove double quotes from specific columns

Hi,

I've a requirement like, in a csv file of 30+ fields where all the columns are having double quotes I need to remove the double quotes from certain fields and certain field should remain as it is.

Eg:

"event_log_id","event_id","event_name","event_time","insertion_time","event_args","src_mac","src_name","src_util_id","src_device_type","src_event_count","src_admin_state","src_ops_state","src_location_util_id","src_location_type","src_service_pt_util_id","src_service_pt_type","src_addr_line1","src_addr_line2"

I need to remove double quotes from 5th 6th 9th 10th & 14th columns. How this can be achieved.

I've used command

sed 's/\"//g'

. But its removing all the double quotes in the file.

Hi,
maybe:

for i in 14 10 9 6 5; do echo 's/"\([^"]*\)"/\1/'$i ; done | sed -f - file.csv

But, all fields must having double quotes otherwise this line not work correctly.

Would this come close:

sed -r 's/"([^"]*)"/\1/5' file

Unfortunately, you'd need to repeat that for every target field.

Moderator comments were removed during original forum migration.

As long as there aren't any commas in your quoted fields, the following is a fairly simple way of getting rid of the quotes in specified fields in a text file named file using commas as field separators:

awk -v list=5,6,9,10,14 '
BEGIN {	OFS = FS = ","
	n = split(list, fields, FS)
}
{	for(i = n; i ; i--)
		gsub(/"/, "", $fields)
}
1' file

With your sample input data, this produces the output:

"event_log_id","event_id","event_name","event_time",insertion_time,event_args,"src_mac","src_name",src_util_id,src_device_type,"src_event_count","src_admin_state","src_ops_state",src_location_util_id,"src_location_type","src_service_pt_util_id","src_service_pt_type","src_addr_line1","src_addr_line2"

which, I assume, is what you wanted.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

a="event_log_id","event_id","event_name","event_time","insertion_time","event_args","src_mac","src_name","src_util_id","src_device_type","src_event_count","src_admin_state","src_ops_state","src_location_util_id","src_location_type","src_service_pt_util_id","src_service_pt_type","src_addr_line1","src_addr_line2"

i=0
for s in `echo $a |sed 's/,/ /g'`
 do
 let i+=1
 if [ $i -ge 2 ]; then echo -n ,; fi
 if [ $i = 5 -o $i = 6 -o $i = 9 -o $i = 10 -o $i =14 ]
 then
   echo -n $s |sed 's/\"//g'
 else
  echo -n $s
 fi 
done

I don't know why one would want to use the above code instead of other suggestions that have been provided by others in this thread. The request presented in this thread is to process a file in CSV format with one line from a sample file provided to show the desired behavior. Nowhere is it suggested that files to be processed will only contain one line.

Note that there is a syntax error (marked in red above). That portion of the conditional expression would need to be changed to $i = 14 (note the added <space> between the = and the 14 ) to get rid of the syntax error.

If the syntax error is corrected, the above code (instead of invoking awk once per file or sed once per file assuming that invocation of sed includes five substitution commands) invokes sed five times for each line in each file to be processed. With anything more than a few lines, invoking sed so often would use significantly more system resources and cause a script using this method to run slower than scripts using the other suggestions provided in this thread.