There is a closed thread called "carriage returns within quotation marks causing new lines in csv" that I am unable to post to, so I am starting a new thread.
The awk solution worked perfectly in most cases. We have some cases where there are multiple carriage returns within a single quoted field. Is there a way to modify this awk script to have it look for multiple occurrences of CR within a single quoted field?
The example given was:
"apple","banana","orange"
"pineapple","grape","straw
berry"
"apple","banana","cherry"
My example would be:
"apple","banana","orange"
"pineapple","grape","straw
berry"
"apple","banana","cherry"
Thanks for any help.
The solution posted by Don Cragan in that thread seem to work fine for your example:
awk '
/^["]/{if(out != "") print out;out = $0;next}
{out = out $0}
END {if(out != "") print out}' infile
Thanks for your reply. It turns out that if the CR is just before the final end quote, it does not work. Any other ideas?
*******************************************************
Looks like it only works for cases like this:
"apple berry company
banner
test"
But not for cases like:
"apple berry company
banner
"
If the closing quote is the beginning of a new line then it does not work. If the line begins with something else and then a closing quote then it works.
Yoda
January 16, 2015, 11:56am
4
Try this:-
awk '/"$/{ORS=RS}!/"$/{ORS=FS}1' file
RudiC
January 16, 2015, 12:43pm
5
remove <CR>s first:
awk '{gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1' file
Syntax error
awk: cmd. line:1: {gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: {gsub(/\r/,""); /"$/{ORS=RS} !/"$/{ORS=FS}1
awk: cmd. line:1: ^ syntax error
awk: cmd. line:2: (END OF FILE)
awk: cmd. line:2: syntax error
There is a ; instead of a closing }
awk '{gsub(/\r/,"")} /"$/{ORS=RS} !/"$/{ORS=FS}1' file
Here is a sed solution:
sed '
/"[^"][^"]*$/{
:L
N;s/\n//
/"[^"][^"]*$/bL
}
' file
In this case the condition for the following lines is identical, so one can simplify
sed '
:L
/"[^"][^"]*$/{
N;s/\n//
bL
}' file
s/\n/ /
would replace the line endings by a space character.
1 Like
Also try:
awk '!(NR%2){gsub(/\n/,x)}1' RS=\" ORS=\" file
--
I presume you meant newline, rather than carriage return
--
On Solaris use /usr/xpg4/bin/awk rather than awk
A slight change to solution #2 to cater for your example:
awk '
/^["]./{if(out != "") print out;out = $0;next}
{out = out $0}
END {if(out != "") print out}' infile