Removal of new line character in double quotes

vsairam · May 18, 2010, 9:12am

Hi,

Could you please help me in removal of newline chracter present in between the double quotes and replacing it with space.

For example ...

Every field is wrapped with double quotes with comma delimiter, so I need to travese from first double quote occerence to till second double quote occurence, if any new line chracter present , I need to replace it with space ..simlarly from double quote occurence 3 to 4, etc.

Input :

 
 
"ABCD RENT-A-
CAR XYZ LTD","00N0H","Enterprise Lake
View Way"

Output would be like this ...

 
"ABCD RENT-A -CAR XYZ LTD","00N0H","Enterprise Lake View Way"

zaxxon · May 18, 2010, 9:16am

If you have GNU sed for example and there is no more lines of text before or after this snippet you can try:

sed -e :a -e 'N; s/\n/ /g; ta' infile
"ABCD RENT-A -CAR XYZ LTD","00N0H","Enterprise Lake View Way"

Franklin52 · May 18, 2010, 9:23am

Another approach with awk :):

awk -F"\"" '!$NF{print;next}{printf("%s ", $0)}' file

alister · May 18, 2010, 1:51pm

Like zaxxon, I also like this approach; clever use of FS and NF.

However, it does have a bug. If the value of $NF is the number zero, !$NF will be true (since $NF is evaluated numerically, instead of as a string), which would be incorrect. The solution would be to use length($NF) or concatenate a null string to force conversion to a string type, $NF"".

Example:

$ cat data
"ABCD RENT-A-
CAR XYZ LTD","00N0H","Enterprise Lake","0
View Way"

$ # Incorrect
awk -F"\"" '!$NF{print;next}{printf("%s ", $0)}' data
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake","0
View Way"

$ # Correct
$  awk -F"\"" '!($NF""){print;next}{printf("%s ", $0)}' data 
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake","0 View Way"

$# Correct
$  awk -F"\"" '!length($NF){print;next}{printf("%s ", $0)}' data
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake","0 View Way"

a golfed version of franklin52's approach:

$ awk -F'"' '$NF""{printf("%s ", $0);next}1' data
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake","0 View Way"

Even so, I'm not sure this approach meets the original poster's needs. If a line with an odd number of quotes ends on a quote, it will not have the trailing newline replaced with a space.

Regards,
Alister

Franklin52 · May 18, 2010, 2:01pm

Good point!

Regards

ygemici · May 18, 2010, 2:17pm

# cat cutt
"ABCD RENT-A-
CAR XYZ LTD","00N0H","Enterprise Lake
View Way"

# a=;while read line; do a="$a $line"; done <cutt ; echo $a
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake View Way"

alister · May 18, 2010, 3:42pm

ygemici:

# cat cutt
"ABCD RENT-A-
CAR XYZ LTD","00N0H","Enterprise Lake
View Way"

# a=;while read line; do a="$a $line"; done <cutt ; echo $a
"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake View Way"

I believe the goal is to not naively join all lines in the data, but only those lines which span quoted text. Otherwise, a simple paste command would do the job.

paste -sd' ' data

Regards,
Alister

---------- Post updated at 03:42 PM ---------- Previous update was at 03:26 PM ----------

Here's a solution that only replaces a newline with a space when that newline occurs between an opening quote character and its corresponding close quote.

sed -n 'H;g;/^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/d; s/^\n//; y/\n/ /; p; s/.*//; h' data

So long as the total number of quote characters encountered is odd, a line is appended to its predecessor. When finally an even number of quote characters have been seen, the resulting concatenation of lines is printed, with all embedded newlines converted to spaces.

Trial run with sample data:

$ cat data
"leave me alone"

"ABCD RENT-A-
CAR XYZ LTD","00N0H","Enterprise Lake","
100 View Way"
$ sed -n 'H;g;/^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/d; s/^\n//; y/\n/ /; p; s/.*//; h' data
"leave me alone"

"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake"," 100 View Way"

The AWK solutions would mishandle the continutation of "100 View Way":

$ awk -F'"' '$NF""{printf("%s ", $0);next}1' data
"leave me alone"

"ABCD RENT-A- CAR XYZ LTD","00N0H","Enterprise Lake","
100 View Way"

Regards,
Alister

ygemici · May 19, 2010, 3:44pm

alister:

I believe the goal is to not naively join all lines in the data, but only those lines which span quoted text. Otherwise, a simple paste command would do the job.
paste -sd' ' data
Regards,
Alister

Hmm thanks alister for your good ideas
I want to show simple way just practical solution

Maybe vsairam friend wants to between line to line in data

[root@rhnserver ~]# sed '/"ABCD/,/100View Way/{H;x;  s/\n//;h;$!d;} ' data
"leave me alone"
 
"ABCD RENT-A-CAR XYZ LTD","00N0H","Enterprise Lake","100 View Way"

Regards
ygemici