Multiple carriage returns within quotation marks causing new lines in csv

Mary_Roberts · January 13, 2015, 5:15pm

There is a closed thread called "carriage returns within quotation marks causing new lines in csv" that I am unable to post to, so I am starting a new thread.

The awk solution worked perfectly in most cases. We have some cases where there are multiple carriage returns within a single quoted field. Is there a way to modify this awk script to have it look for multiple occurrences of CR within a single quoted field?

The example given was:

"apple","banana","orange"
"pineapple","grape","straw
berry"
"apple","banana","cherry"

My example would be:

"apple","banana","orange"
"pineapple","grape","straw

berry"
"apple","banana","cherry"

Thanks for any help.

Chubler_XL · January 13, 2015, 7:00pm

The solution posted by Don Cragan in that thread seem to work fine for your example:

awk '
/^["]/{if(out != "") print out;out = $0;next}
{out = out $0}
END {if(out != "") print out}' infile

Mary_Roberts · January 16, 2015, 11:35am

Thanks for your reply. It turns out that if the CR is just before the final end quote, it does not work. Any other ideas?

*******************************************************

Looks like it only works for cases like this:

"apple berry company
banner
test"

But not for cases like:

"apple berry company
banner
"

If the closing quote is the beginning of a new line then it does not work. If the line begins with something else and then a closing quote then it works.

Yoda · January 16, 2015, 11:56am

Try this:-

awk '/"$/{ORS=RS}!/"$/{ORS=FS}1' file

RudiC · January 16, 2015, 12:43pm

remove <CR>s first:

awk '{gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1' file

Mary_Roberts · January 16, 2015, 3:55pm

Syntax error

awk: cmd. line:1: {gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1
awk: cmd. line:1:                     ^ syntax error
awk: cmd. line:1: {gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1
awk: cmd. line:1:                                   ^ syntax error
awk: cmd. line:2: (END OF FILE)
awk: cmd. line:2: syntax error

RudiC · January 16, 2015, 4:08pm

Quote from Don Cragun:

MadeInGermany · January 16, 2015, 4:46pm

There is a ; instead of a closing }

awk '{gsub(/\r/,"")} /"$/{ORS=RS} !/"$/{ORS=FS}1' file

Here is a sed solution:

sed '
/"[^"][^"]*$/{
:L
N;s/\n//
/"[^"][^"]*$/bL
}
' file

In this case the condition for the following lines is identical, so one can simplify

sed '
:L
/"[^"][^"]*$/{
N;s/\n//
bL
}' file

s/\n/ / would replace the line endings by a space character.

Scrutinizer · January 17, 2015, 5:34am

Also try:

awk '!(NR%2){gsub(/\n/,x)}1' RS=\" ORS=\" file

--
I presume you meant newline, rather than carriage return

--
On Solaris use /usr/xpg4/bin/awk rather than awk

Chubler_XL · January 18, 2015, 3:06pm

A slight change to solution #2 to cater for your example:

awk '
/^["]./{if(out != "") print out;out = $0;next}
{out = out $0}
END {if(out != "") print out}' infile