Removing quotes within quotes

Hello everyone,

I am working on a file with thousands of lines and instead of manually removing them I need a script to remove quotes within quotes. For example a line may have something such as this:

"Hey, I was ready to go on stage or "break a leg", but I failed miserably."

So I need to remove the quotes around break a leg with something such as awk or sed. Does anyone know how to do this? I've looked online for different examples but did not find anything. If someone could offer up a solution this would save me a ton of time and I would sincerely appreciate!

That's not a trivial task. In fact, with the "sample" you've given us, it's likely impossible!

Please provide a broader sample of the text.

I didn't know that it was that non trivial of a task. My mistake. The thing is that the " " inside the " " occur in random places.

"Of course, never say never. This is perhaps a "See you later"rather than a "Goodbye". Thank you for reading."

"I\'ve been rehabilitated. It\'s such a "wonderful" account of her journey."

Also, if it matters I'm working in notepad++.

If the outer " are always at BOL and EOL, try

awk '{gsub (/^"|"$/, "\001"); gsub (/"/, ""); gsub ("\001", "\"")} 1' file
"Of course, never say never. This is perhaps a See you laterrather than a Goodbye. Thank you for reading."
"I\'ve been rehabilitated. It\'s such a wonderful account of her journey."

It's easy if the line always starts and ends with a quote:

$ sed '/^$/n;s/"//g;s/^/"/;s/$/"/' file
"Of course, never say never. This is perhaps a See you laterrather than a Goodbye. Thank you for reading."

"I\'ve been rehabilitated. It\'s such a wonderful account of her journey."

Otherwise, it's not so easy - for example, if quoted strings span lines.

OK, try this:

awk     'match ($0, /^[^"]*"/)  {$0=substr($0,1,RLENGTH-1) "\001" substr ($0,RLENGHTH+1)}
         match ($0, /"[^"]*$/)  {$0=substr($0,1,RSTART-1) "\001" substr ($0,RSTART+1)}   
                                {gsub (/"/, ""); gsub ("\001", "\"")}
         1
        ' file

Yes, the line itself always begins and ends with a quote. I will try both of your solutions and report back as soon as I can. Thank you very much for your guidance and help.

Try also this sed dy:

sed -r 's/(^[^"]*)"/\1^A/; s/"([^"]*$)/^A\1/; s/"//g; s/^A/"/g' file
"Hey, I was ready to go on stage or "break a leg", but I failed miserably."
"Of course, never say never. This is perhaps a "See you later"rather than a "Goodbye". Thank you for reading."
"I\'ve been rehabilitated. It\'s such a "wonderful" account of her journey."
"This is a test"
Here are some more test
Now we have a single ", just for test

Replace first and last " with , then remove all other " , then convert back to "

awk '/^"/ {gsub(/^"|"$/,"�");gsub(/"/,"");gsub(/�/,"\"")}1' file
"Hey, I was ready to go on stage or break a leg, but I failed miserably."
"Of course, never say never. This is perhaps a See you laterrather than a Goodbye. Thank you for reading."
"I\'ve been rehabilitated. It\'s such a wonderful account of her journey."
"This is a test"
Here are some more test
Now we have a single ", just for test

1) "�" is a two-byte-character. No major problem, but to be kept in mind.
2) It may be used in some texts; it will be erased, then.

What do I change to, to make sure its not used in the text?
Any good suggestion :slight_smile:

Nothing. Literally. It seems easier just to remove all double quotes, then add back one at the start and end of the line.

A basic bash version done longhand...
Not sure if it is of any use but it is _weapon_ of last resort... ;o)
(Hopefully it displays correctly...)

Last login: Sat Aug 31 19:20:17 on ttys000
AMIGA:barrywalker~> text="\"This is a \"quote within\" a quote.\""
AMIGA:barrywalker~> echo -e -n "$text" > /tmp/quotes.txt
AMIGA:barrywalker~> text=""
AMIGA:barrywalker~> newtext=""
AMIGA:barrywalker~> read text < /tmp/quotes.txt
AMIGA:barrywalker~> echo "$text"
"This is a "quote within" a quote."
AMIGA:barrywalker~> echo "${#text}"
35
AMIGA:barrywalker~> # Correct string length...
AMIGA:barrywalker~> for n in $( seq 0 1 ${#text} ); do if [ "${text:$n:1}" == '"' ]; then newtext=$newtext; else newtext=$newtext${text:$n:1}; fi; done
AMIGA:barrywalker~> echo "$newtext"
This is a quote within a quote.
AMIGA:barrywalker~> echo "${#newtext}"
31
AMIGA:barrywalker~> # Correct string length...
AMIGA:barrywalker~> # Done longhand and will be slow but works.
AMIGA:barrywalker~>
AMIGA:barrywalker~> text="\"This is a \"quote within\" a quote.\""
AMIGA:barrywalker~> newtext=""
AMIGA:barrywalker~> for n in $( seq 0 1 ${#text} ); do if [ "${text:$n:1}" == '"' ]; then newtext=$newtext"'"; else newtext=$newtext${text:$n:1}; fi; done
AMIGA:barrywalker~> echo "$newtext"
'This is a 'quote within' a quote.'
AMIGA:barrywalker~> # Quotes changed from double to single...
AMIGA:barrywalker~> _

EDIT:

Juggle the part that is...

${text:$n:1}

...and leave the start and end double quotes in as required...

My last two suggestions above don't care if it's at the begin or the end of a line. They leave the outermost quotes alone and remove all the inner ones.

Good idea, like this:

awk '/^"/ {gsub(/"/,"");$0="\""$0"\""}1' file

Another "double-quotes-always-at-the-beginning-and-end" one:

awk -F'^"|"$' '{gsub(/"/,x,$2)}1' OFS=\" file