Replacing all but the first and last double quote in a line with a single quote with awk

From:

1,2,3,4,5,This is a test
6,7,8,9,0,"This, is a test"
1,9,2,8,3,"This is a ""test"""
4,7,3,1,8,""""

To:
1,2,3,4,5,This is a test
6,7,8,9,0,"This; is a test"
1,9,2,8,3,"This is a ''test''"
4,7,3,1,8,"''"

Is there an easy syntax I'm overlooking? There will always be an odd number of quotes (0,2,4,6,etc). If present the first quote will follow a comma and if present the final quote will precede a LF.

Prior to this, I have replaced any LF (but not CR+LF) with | , than replaced any CR+LF with LF and replaced any , in the rightmst field with ; (by using a loop to look for more csv columns and OFS=; ). I was not aware of the "" mean " thing in the .csv standard however.

Mike

Try:

awk '{gsub(/"/,q,$NF); gsub("^" q "|" q "$","\"",$NF)}1' FS=, OFS=, q=\'  file
1 Like

I think I understand the code but what does the 1 do? Also why is the or in quotes?

Unfortunately I often have single quotes at the end of a line, so this often fails. The following example is based on actual data:

echo "1,2,3,4,5,retested after moving sample 8'" | awk '{gsub(/"/,q,$NF); gsub("^" q "|" q "$","\"",$NF)}1' FS=, OFS=, q=\'
1,2,3,4,5,retested after moving sample 8"

I also think this would break with commas inside quotes (as I initially posed the question before editing). Fortunately I already replaced them with semicolons.

Mike

PS. kicked myself many times for not using .tsb instead of .csv at the beginning of this project . . .

Another way of doing this is like this, which is perhaps a bit clearer:

awk '{gsub(/"/,q,$NF); gsub(r,"\"",$NF)}1' FS=, OFS=, q=\' r="^'|'$" file

Which replaces all the double quotes with single quotes and then changes the first and the last single quote in the last field back to double quotes..
The 1 means: the condition is true, no action was specified , so perform the default action, which is {print $0}

If you have single quotes in you input, then a third, intermediate character is needed that is not in your input. For this we can use any character that is not in your input. This example uses a newline character (which is equal to RS), which cannot be in the input, since awk is reading line by line and strips the newline.

awk '{gsub(/"/,RS,$NF); gsub(r,"\"",$NF); gsub(RS,q,$NF)}1' FS=, OFS=, q=\' r="^\n|\n$" file

With comma's inside double quotes this becomes more complicated, since you would need to combine with the earlier solution..

--
Edit: only just noted that in you original example comma's inside double quotes need to be converted to semicolons, but you already changed them, like you said..

I got around using an intermediate character (in the past I have used some of the old ASCII punch card/paper tape control characters 28-32) by brute forcing first and last quote removal. There should be no need to exclude a character now.

awk -F, 'BEGIN { OFS=","; comm=22}                        #Convert any "," in comments to ";"
             { for(i=comm+1; i<=NF; i++) {$comm=$comm";"$i}
               if (NF>comm) NF=comm; print $0 }' |
    awk -F, 'BEGIN { OFS = ","
                     # Read in Known Fail List
                     getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline < "'"$failListFile"'" # Header Rows
                     while (getline < "'"$failListFile"'") { split( $0, a, ","); i=a[1]a[2]a[3]; gsub ( " ", "", i ); failMessage=a[6]
                     failStart=a[4]? a[4] : "0000 01 01 00 00 00"
                     failEnd=a[5]? a[5] : "9999 12 31 23 59 59" }
                     close("'"$failListFile"'")}
             !($7 == "" || $9 == "" || $10 == "" || $11 == "" || $12 == "") {
                split($7,a," "); split(a[1],d,"/"); split (a[2],t,":")
                month = sprintf("%02d",d[1]); day = sprintf("%02d",d[2])    #All 2 digits
                year = 2000 + d[3] % 100                                    #Force 4 digit year
                hour = sprintf("%02d",t[1]); min = sprintf("%02d",t[2])
                date = month"/"day"/"year; time = hour":"min
                $7 = date" "time
                if ( $19 == "Y" ) $19 = "V";        #Allowing for older Raw Data Files and Archives
                else if ( $19 == "N" ) $19 = "I";   #to use the older YNE vs VIE Valid column.
                else if ( $19 =="" ) $19 = "I"      #if valid column manually erased, treat as Invalid
           #     $22 = $22 ~ /^\".*\"$/ ? $22 : "\""$22"\"" # put quotes around comment if Excel did not already
                gsub(/^\"/,"",$22); gsub(/\"$/,"",$22); gsub (/\"\"/,"\x27\x27", $22); $22 = "\""$22"\"" #remove wrapping quotes, Change "" (.csv representation of ") to '', rewrap in quotes
                i=$1$6$8; gsub ( " ", "", i )
                if ( $18 == "FAIL" && i in failMessage ) { now = mktime(year" "month" "day" "hour" "min" 00")
                    if ( now >= mktime(failStart) && now <= mktime(failEnd) ) {
                        if ($22 == "\"\"") gsub ( "\"$", "Known Fail: "failMessage"\"", $22 )
                        else gsub ( "\"$", "|Known Fail: "failMessage"\"", $22 ) } }
                print $0
             }'

Took me a long time to figure out that you could not escape ' characters in AWK with \' or a whole bunch of other things with and without a variable declaration (but you can do so in BASH with the awk -v option), so I used \x27

Excel column 22:

Fake data: comment with "quotes"
Fake data, comment, with, commas,,,

.CSV column 22:

"Fake data: comment with ""quotes"""
"Fake data, comment, with, commas,,,"

Output of script column 22:

"Fake data: comment with ''quotes''"
"Fake data, comment, with, commas,,,"

Mike

You can do that with awk, it is not an awk thing, it is a shell thing.. In shell you cannot escape characters that are in single quotes. And since the actual awk script is enclosed in single quotes...

What you can do is have a file that contains the awk script and call it like this:

awk -f awk.script

Then you do not need to worry about escaping quotes...

For example:

$ awk 'BEGIN{print "This is a quote: '\''"}' 
This is a quote: '

$ cat awk.script
BEGIN {
  print "This is a quote: '"
} 

$ awk -f awk.script
This is a quote: '