Adding an escape character

I have a pipe delimited (|) data in a file and all the fields are enclosed with " ". If " is present in the data, then I have to replace with \".

Example:
Input: "abc"|"test " user""|"A"B" user"
Output: "abc"|"test \" user\""|"A\"B\" user"

I tried with below command, but it is not giving me the expected result.

sed  's/\([^|"]\)\"\([^"|]\)/\1\\"\2/g;'

@suneelkumar.mekala
welcome to the community.
Get used to wrapping your code/data samples with markdown code tags.
I did it for you for now, but please do so going forward.

Otherwise it's hard to see see what you're after - particularly in THIS case.

2 Likes

@suneelkumar.mekala
the embedded double-quotes should be perfectly normal for the CSV (in your case PSV - PipeSeparatedValues) files.
But if you insist, here's one way with gawk:
gawk -F'|' -f sunee.awk OFS='|' where sunee.awk is:

function deq(str,  a) {
    if(!match(str, /^(\s*)\x22(.*)\x22(\s*)$/, a))
      return str
    else {
      gsub(/\x22/, "\\\\&", a[2])
      return a[1] "\x22" a[2] "\x22" a[3]
    }
}
{
   for(i=1; i<=NF;i++)
     $i=deq($i)
   print
}

yielding:

$ echo '"abc"|"test " user""|"A"B" user"'| gawk -F'|' -f sunee.awk OFS='|'
"abc"|"test \" user\""|"A\"B\" user"

and

echo '"abc"|    "test " user""|"A"B" user"  '| gawk -F'|' -f sunee.awk OFS='|'
"abc"|    "test \" user\""|"A\"B\" user"

You didn't mention your OS => the above is Linux/gawk specific, but can be modified to be awk-version-agnostic.

Both neighbors of a matching " must be anything but a |
sed -E 's/([^|])"([^|])/\1\\"\2/g'
But this does not see/match a near second " because a /g iteration continues after the previous coverage.
Perl knows a not covering lookbehind/lookahead, so lets switch to Perl!

perl -lpe 's/(?<=[^|])"(?=[^|])/\\"/g'

BTW RFC-4180 wants to double the embedded quotes:

perl -lpe 's/(?<=[^|])"(?=[^|])/""/g'

Thank you.

It is working except for one scenario where it does not giving the required output.

Script used:

perl -lpe 's/\\/\\\\/g;s/(?<=[^|])"(?=[^|])/\\"/g'

Input data:

""27/2/24(defg)

"123456
TESTING BLACK""|"FALSE"

Current Output:

 "\"27/2/24(defg)

"123456

TESTING BLACK"\"|"FALSE"

Expected Output:

"\"27/2/24(defg)

"\123456
TESTING BLACK"\"|"FALSE"

The simple substitution runs within a line, and without further flag variables it treats a " at the beginning of the line as a new record.
The following (executable!) sed script might better suit your needs:

#!/usr/bin/sed -f
/^"/{
  :L1
  /"$/b endrec
  $b endrec
  N
  b L1
}
:endrec
s/"/\\"/g
s/^\\"/"/
s/\\"$/"/
s/\\"|\\"/"|"/g

It gathers a record in the input buffer; a " at the end of the line indicates the end of the record.
Missing a lookbehind/lookahead, the substitution is done globally first, then undone at the field boundaries.