Removing " from a text using awk

I was testing some and from this string try to remove all " , but not \"

cat file
The quick brown fox "jumps", over  the 'lazy \"dog\"'

result requested: The quick brown fox jumps, over the 'lazy \"dog\"'

I have seen a working solution for sed , but I like awk :slight_smile:

This code seem to work, but for some reason it does remove a blank space after the fox , why?

awk '{gsub(/[^\\]\"/,x)}1' file
The quick brown foxjump, over  the 'lazy \"dog\"'

As far as I understand this means not \ and " , so why is the space gone?

EDIT:
I found why :slight_smile:
not \ can is any characters but not \ , so space is gone and the s in jumps
Any Idea on how to fix this in awk ?
gensub should work, but not very portable.

---------- Post updated at 13:26 ---------- Previous update was at 12:57 ----------

I found two working version.
Fist is not perfect, but is portable.
Second is less portable.

awk '{gsub(/\\\"/,"_#_");gsub(/\"/,x);gsub(/_#_/,"\\\"")}1'
awk '{print gensub(/([^\\])\"/, "\\1", "g")}'

Try one more

$awk -F '\\\\"' '{for(i=1;i<=NF;i++){gsub("\"","",$i)}}1' OFS='\\"' file

The quick brown fox jumps, over  the 'lazy \"dog\"'

Would this help you

$ awk '{gsub(/[^\\A-Za-z]\"/," ");gsub(/"\,/,",")}1' file

Resulting

The quick brown fox jumps, over  the 'lazy \"dog\"'

your s is missing because gsub(/[^\\]\"/," ") its removing space + string after the space it was suppose to be
gsub(/[^\\A-Za-z]\"/," ")

For example

$ echo "test \"demo\" " | awk '{gsub(/[^\\]\"/," ");}1'
test dem  # o is missing here
$ echo "test \"demo\" " | awk '{gsub(/[^\\A-Za-z]\"/," ");}1'
test demo" 

@Akshay Hegde
Your solution fail with this line:

The quick brown fox "jumps", over  the 'lazy \"dog\"' here are some "more data"

it gives

The quick brown fox jumps, over  the 'lazy \"dog\"' here are some more data"

The blue " at final is not removed.

Thanks Jotne

Try... for given input this will work :slight_smile:

$ awk '{gsub(/[^\\A-Za-z]\"|"$/," ");gsub(/"\, /,",")}1' file
The quick brown fox jumps,over  the 'lazy \"dog\"' here are some more data

OR

$ awk '{for(i=1;i<=NF;i++)if($i~/^\"|"$/){gsub("\"","",$i);printf $i FS}else{printf $i FS}printf RS }' file
The quick brown fox jumps, over the 'lazy \"dog\"' here are some more data 

I messed some up, sorry :slight_smile:
pamus solution is ok, and mine to.
And Akshay Hegdes now works fine.

Hi,
If you use the commutator & :

awk '{gsub(/[^\\]"/,"&\"");gsub(/""/,"")}1' file

Regards.

Smart idea :slight_smile:
You just add some extra to the stand alone " , then remove it.

Just a small variation (saves two characters):

awk '{gsub(/[^\\]"/,"&_");gsub(/"_/,x)}1' file

Hello Akshay,

Could you please explain both the codes a bit please.

Thanks,
R. Singh

Both suggestions will run into trouble if the replacement string is already present on the line, or if a double quote is in the first position..

--
Try:

awk '{gsub(/\\"/,RS); gsub(/"/,x); gsub(RS,"\\\"")}1' file

With RS="\n" as the record separator, we can be sure that it will never appear on the line, so that is a suitable intermediate character.

We can always create a string that not present, but therefore a fix for case of double quote in the first position:

awk '{gsub(/^"|[^\\]"/,"&ASTRINGTHATNOTEXIST");gsub(/"ASTRINGTHATNOTEXIST/,"")}1' file

Regards.

That will still pose a problem if there are two consecutive double quotes ( "" ) in the input file

Try this awk one liner...

awk '{FS="\"";for (i=1;i<=NF;i++) printf("%s%s",$i,($i ~ "[\]$" ? FS : (i < NF ? "" :"\n")))}' file

Golfing with ORS:

awk '{ORS=(/\\$/ ? RS : x)}1' RS=\"

That should always work, except in the unlikely case that ....

Can you figure it out?

Regards,
Alister

1 Like

Sorry Jotne...

As I know little awk and its derivatives I decided to use shell builtins and /bin commands
only to see how easy it is...

#!/bin/sh
# Using shell builtins and /bin ONLY...
# Generate the string.
echo 'The quick brown fox ''"jumps"'', over the lazy \"dog\".\c' > /tmp/text
# Load the file into a string variable.
text=`cat < /tmp/text`
# Show it...
echo "$text"
newtext=""
decimal=0
subscript=0
length=$[ ${#text} - 1 ]
while [ $subscript -le $length ]
do
	decimal=`printf "%d" \'${text:$subscript:1}`
	if [ $decimal -eq 34 ]
	then
		subscript=$[ $subscript + 1 ]
	fi
	if [ "${text:$subscript:2}" == '\"' ]
	then
		newtext=$newtext'\"'
		subscript=$[ $subscript + 2 ]
	else
		newtext=$newtext${text:$subscript:1}
		subscript=$[ $subscript + 1 ]
	fi
done
# Print the final string... 
echo "$newtext"

Results on OSX 10.7.5 using /bin/sh...

Last login: Thu Oct 24 17:44:02 on ttys000
AMIGA:barrywalker~> ./quotes.sh
The quick brown fox "jumps", over the lazy \"dog\".
The quick brown fox jumps, over the lazy \"dog\".
AMIGA:barrywalker~> 

I liked this challenge but only wish I knew more about awk...

@alister, very nice! The only caveat would be if there are character sequences without double quotes that are too long the maximum record length gets exceeded....

@wisecracker. that is OT considering that this thread is specifically about awk .

the revised script below seems to work fine (I not found example that not work):

awk '{gsub(/^"|[^\\]"/,"&\\");gsub(/^"|"\\"?/,"")}1' file

Advantage: no need to search a string that not exist...

Problem for me (at this moment): I don't know how to explain why it work fine. :o

Regards.

This does however pose a problem with \"\

1 Like

Thanks to have found this problem with \"\ and I don't think that a solution exist in this way without use a string that does not exist in the file.

Regards.

True, but that can be an issue regardless of record delimiter. I hope that implementations with a hardcoded limit are smart enough to fail loudly (scream and stop) in such cases.

The caveat I had in mind was the case of input data ending with a backslash (which would produce a spurious trailing double quote). This would not be a valid text file, but "text" that does not end with a newline isn't unheard of.

Regards,
Alister