I have the below sample file each field seperated with comma - need to extract the 5th field which is "txt4 "(tst)"" and replacing the double quotes by doubling the quotes within that field i.e. "txt4 ""(tst)"""
Need to achieve this o/p using awk
sample csv file
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""
desired o/p
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)"""
This is quite tricky. This might work if there are never commas inside double quotes, otherwise you would need something else..
awk '
{
for(i=1; i<=NF; i++) { # for every comma-separated field:
gsub(/^"|"$/,RS,$i) # Replace the leading and closing double quote by a newline
gsub(/"/,"&&",$i) # Replace remaining quotes by 2 double quotes
}
gsub(RS,"\"") # Turn the newlines back into the original double quotes on the whole line
print # Print the line
}
' FS=, OFS=, file
a somewhat similar technique (with the same disclaimer caveat) if you have one of the most recent gawk-s with the FPAT support . Adjusted based on the gawk manual example
Here's the test harness
BEGIN {
qq="\""
FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
gsub(qq "[^"qq"]+"qq qq, qq "&" qq, $i)
printf("$%d = <%s>\n", i, $i)
}
}