I have a pipe delimited (|) data in a file and all the fields are enclosed with " ". If "
is present in the data, then I have to replace with \"
.
Example:
Input: "abc"|"test " user""|"A"B" user"
Output: "abc"|"test \" user\""|"A\"B\" user"
I tried with below command, but it is not giving me the expected result.
sed 's/\([^|"]\)\"\([^"|]\)/\1\\"\2/g;'
@suneelkumar.mekala
welcome to the community.
Get used to wrapping your code/data samples with markdown code tags.
I did it for you for now, but please do so going forward.
Otherwise it's hard to see see what you're after - particularly in THIS case.
2 Likes
@suneelkumar.mekala
the embedded double-quotes should be perfectly normal for the CSV (in your case PSV - PipeSeparatedValues) files.
But if you insist, here's one way with gawk
:
gawk -F'|' -f sunee.awk OFS='|'
where sunee.awk
is:
function deq(str, a) {
if(!match(str, /^(\s*)\x22(.*)\x22(\s*)$/, a))
return str
else {
gsub(/\x22/, "\\\\&", a[2])
return a[1] "\x22" a[2] "\x22" a[3]
}
}
{
for(i=1; i<=NF;i++)
$i=deq($i)
print
}
yielding:
$ echo '"abc"|"test " user""|"A"B" user"'| gawk -F'|' -f sunee.awk OFS='|'
"abc"|"test \" user\""|"A\"B\" user"
and
echo '"abc"| "test " user""|"A"B" user" '| gawk -F'|' -f sunee.awk OFS='|'
"abc"| "test \" user\""|"A\"B\" user"
You didn't mention your OS => the above is Linux/gawk specific, but can be modified to be awk-version-agnostic.
Both neighbors of a matching "
must be anything but a |
sed -E 's/([^|])"([^|])/\1\\"\2/g'
But this does not see/match a near second "
because a /g
iteration continues after the previous coverage.
Perl knows a not covering lookbehind/lookahead, so lets switch to Perl!
perl -lpe 's/(?<=[^|])"(?=[^|])/\\"/g'
BTW RFC-4180 wants to double the embedded quotes:
perl -lpe 's/(?<=[^|])"(?=[^|])/""/g'