Correct incomplete fields separated by new lines

mehimadri12 · May 29, 2015, 12:51am

Hello Friends,

I have an issue with a csv file that is separated by comma. The file should have 5 fields every time. The record delimiter of the file is \r\n but we are seeing that in few records the address field has \r\n too in them which is causing the line to break into two or more lines.

Please see the example file below:
The file should be read as follows:
firstname,lastname,address,city,state

example data is:

david,smith,123 Lindsay Street,columbus,oh
john,bush,5434A 
Cresent Drive, Cleveland,oh
Micheal,Slater,34E Lobson
Street NE
Apt 3,Burbank,43017
Bill,thompson,1298 Bread Street,Cincinnati,oh

How should I code to convert the above file to the following:

david,smith,123 Lindsay Street,columbus,oh
john,bush,5434A Cresent Drive, Cleveland,oh
Micheal,Slater,34E Lobson Street NE Apt 3,Burbank,43017
Bill,thompson,1298 Bread Street,Cincinnati,oh

Please let me know if I need to provide further details.

Scrutinizer · May 29, 2015, 2:52am

Try something like:

tr -d '\r' < file | awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n}1'

or

awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n; gsub(/\r/,x)}1' file

which should work as long as the \r\n does not appear in field nr. 5

---
On Solaris use /usr/xpg4/bin/awk rather than awk

venky.b5 · May 29, 2015, 7:24am

Hi Scrutinizer,

I am trying to understand code for my learning purpose, can you please explain if possible.

awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n}1'

why it is (getline n) >0
what i have known is getline is used to read nextline, ending 1 is always true and records with fields lessthan 5

please correct me if am wrong

Thanks

venky

RudiC · May 29, 2015, 7:47am

man awk :