Multiple lines into one using PERL or SHELL

Amit.Sagpariya · June 2, 2010, 9:23am

Hi All,

I need your help to solve problem using either PERL script or SHELL script.

We are receving a file, in which one record is coming in multiple rows. The main problem is, we are not able to differenciate when the 1st record ends and where the second record starts.

For example,

 
I
am 
a
student
in 
which 
company
are
you working

Actually, these should be as follow

 
I am a student
In which company are you working.

Can someone help me to solve this problem. I was asked to use CHOMP of perl to solve this problem but when i googled, i found that CHOMP is use to remove any newline character from the end of the string.

I really do not understand, how to get rid of it.

Peace_Dude1 · June 2, 2010, 9:34am

From the sample you have provided, it would be difficult to solve your problem using a simple shell script.

What signifies the end of a sentence? A period perhaps.

Peace

pseudocoder · June 2, 2010, 9:40am

That's a BIG problem.

durden_tyler · June 2, 2010, 10:04am

I think the best thing to do is to talk to the person/department/group/team that generates your data file and ask them to put a delimiter character that marks the end of a record. Or maybe enclose the multiline records within double quotes.

Put simply, if a human being cannot figure out the end of a record, then most definitely, a computer can not figure that out too !

tyler_durden

Amit.Sagpariya · June 3, 2010, 7:56am

Hi All,

I have found some idea to handle this situation.

Actually, each lines contains only 80 characters. But as per specification document, 540 characters are of HEADER and then 540 characters are of DETAIL RECORD and very last row is FOOTER record.

In this case, 6 lines (of 80 characters) and 7th lines (with 60 characters) is a HEADER RECORD.

Starting from 61th caracter of 7th lines till 40th character of 14th lines will be DETAIL RECORD 1 and so on...

Is it possible to read line by line and count the characters and seperate the record?

Can somebody help me to write this type of logic or any other idea?