Remove carriage return and append the next line

mad_man · May 11, 2016, 10:34am

Hi All,

My requirement is to remove the carriage return in from the lines which i am reading if the length is lesser than 1330 and append the next line with it. Below is the realistic example of file structure.
Input file:

Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah
blah blah blah
Blah blah blah blah Blah blah blah blah

Output file:

Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah

With adding a single space in front of the 4th line i want to append it to 3rd line in input file.

I know how to do this with WHILE - DO - DONE loop but it works only well for small files. For huge files it will lead into performance issue.

So i would like to do this in AWK or SED command. Requesting your help.

Thanks.

---------- Post updated at 08:04 PM ---------- Previous update was at 08:02 PM ----------

Note I am using AIX 6.0 flavor of UNIX system.

RudiC · May 11, 2016, 10:42am

You're talking of <newline> chars, not <carriage return>?

Try

awk  '{while (length < 1330 && getline s>0) $0=$0 " " s}1'  file

RavinderSingh13 · May 11, 2016, 10:44am

Hello mad man,

Could you please try following and let me know if this helps you.

awk '{gsub(/\r/,X,$0);if(length($0)<1330){Q=Q?Q OFS $0:$0};if(length($0)>1330){if(Q){print Q;Q=""};print}}'   Input_file

Thanks,
R. Singh

mad_man · May 11, 2016, 11:04am

Hi Rudi C,

I tried your solution But it made to append the impacted line's before and after lines are also to get append behind their previous lines

---------- Post updated at 08:34 PM ---------- Previous update was at 08:31 PM ----------

ravindersingh13:

Hello mad man,

Could you please try following and let me know if this helps you.
awk '{gsub(/\r/,X,$0);if(length($0)<1330){Q=Q?Q OFS $0:$0};if(length($0)>1330){if(Q){print Q;Q=""};print}}'   Input_file
Thanks,
R. Singh

Hi I tried this it does not produced any output.

RavinderSingh13 · May 11, 2016, 11:07am

Hello mad man,

Not sure but the Input_file which you have shown doesn't have any line whose length is more than 1330, so if you are using same Input_file I don't think so it will show any output. You could confirm by doing following too.

awk '{print length($0)}' Input_file

In case no line is having more than 1330 length then you could change the length into above code.

Thanks,
R. Singh

RudiC · May 11, 2016, 12:28pm

What be the actual lengths of your lines?

mad_man · May 11, 2016, 2:27pm

@ Ravinder singh

i executed the last command and got 1329 as maximum value in the file.
i edited the command you have given replacing all 1330 with 1329 still no luck.
Thanks

---------- Post updated at 11:57 PM ---------- Previous update was at 11:56 PM ----------

@RudiC hope my last post would have answered you too.

Aia · May 11, 2016, 3:43pm

Do you mind Perl?

perl -nale 'push @A, @F; if($A[1329]){@p = splice @A, 0, 1330; print "@p"}}END{print "@A"' blah.txt

RavinderSingh13 · May 12, 2016, 12:41am

Hello mad man,

Could you please try following and let me know if this helps.

awk 'FNR==NR{len=length($0)<len?len:length($0);next} {gsub(/\r/,X,$0);if(length($0)<len){Q=Q?Q OFS $0:$0};if(length($0)==len){if(Q){print Q;Q=""};print}}'  Input_file  Input_file

Above code will find out the maximum length into your Input_file and then if any line in Input_file is having lesser length than maximum length value it will print them all together. There could be more terms and conditions but as per your requirement shown I would like to suggest you please try above and let me know how it goes then.

EDIT: Adding a more robust code here, so let's say our Input_file is as follows.

cat Input_file
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah
blah blah blah
Blah blah blah blah Blah blah blah
Blah blah blah blah Blah blah blah blah

Here I have assumed like if we take maximum length in whole file and then we try merge the lines, there may be a chance while merging lines that they could cross the maximum length as follows we will get while running above code.

awk 'FNR==NR{len=length($0)<len?len:length($0);next} {gsub(/\r/,X,$0);if(length($0)<len){Q=Q?Q OFS $0:$0};if(length($0)==len){if(Q){print Q;Q=""};print}}'  Input_file Input_file
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah Blah blah blah blah Blah blah blah (Which is more than maximum length)
Blah blah blah blah Blah blah blah blah

To remove these kind of conditions(if you have any into your Input_file) you could try following code then for same.

awk '{Q=Q?Q OFS $0:$0;len=NF<len?len:NF} END{;print Q > "tmp_file";system("xargs -n " len " < tmp_file")}'   Input_file; rm tmp_file

Output will be as follows then.

Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah Blah
blah blah blah Blah blah blah blah

Thanks,
R. Singh

mad_man · May 12, 2016, 6:37am

Hi All

I have attached the sample input and output files for your view.

Thanks.

mad_man · May 12, 2016, 7:25am

Thanks for your reply.
This perl had gave output in a single line and all the in between multiple spaces were shrunken to single space. But i want the original length.

Thanks.

---------- Post updated at 04:55 PM ---------- Previous update was at 04:10 PM ----------

ravindersingh13:

Hello mad man,

Could you please try following and let me know if this helps.
awk 'FNR==NR{len=length($0)<len?len:length($0);next} {gsub(/\r/,X,$0);if(length($0)<len){Q=Q?Q OFS $0:$0};if(length($0)==len){if(Q){print Q;Q=""};print}}'  Input_file  Input_file
Above code will find out the maximum length into your Input_file and then if any line in Input_file is having lesser length than maximum length value it will print them all together. There could be more terms and conditions but as per your requirement shown I would like to suggest you please try above and let me know how it goes then.

EDIT: Adding a more robust code here, so let's say our Input_file is as follows.
cat Input_file
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah
blah blah blah
Blah blah blah blah Blah blah blah
Blah blah blah blah Blah blah blah blah
Here I have assumed like if we take maximum length in whole file and then we try merge the lines, there may be a chance while merging lines that they could cross the maximum length as follows we will get while running above code.
awk 'FNR==NR{len=length($0)<len?len:length($0);next} {gsub(/\r/,X,$0);if(length($0)<len){Q=Q?Q OFS $0:$0};if(length($0)==len){if(Q){print Q;Q=""};print}}'  Input_file Input_file
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah Blah blah blah blah Blah blah blah (Which is more than maximum length)
Blah blah blah blah Blah blah blah blah
To remove these kind of conditions(if you have any into your Input_file) you could try following code then for same.
awk '{Q=Q?Q OFS $0:$0;len=NF<len?len:NF} END{;print Q > "tmp_file";system("xargs -n " len " < tmp_file")}'   Input_file; rm tmp_file
Output will be as follows then.
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah blah
Blah blah blah blah Blah blah blah Blah
blah blah blah Blah blah blah blah
Thanks,
R. Singh

Hi

The first 2 commands did not produce any output, but the 3rd query produced irregular output. Kindly look at the input and output file i attached.

Note: The rule here is the line is not 1329 in length then the next line will be it's missing part. There no scenario of multiple length messages. fixed length will be 1329.

Thanks.

mad_man · May 12, 2016, 7:37am

Hi Please find the output file created by above inline perl command attached.

mad_man · May 12, 2016, 7:42am

Hi R.Singh,

Please find the output file generated out of 3rd command attached.

mad_man · May 12, 2016, 7:49am

Hi Rudi C,

I just changed the 1330 to 1329 in the abv command, looks like the command just added a space in front of the folded line but it did not added it to the previous line. Please find the output file attached. Please find the sample input file in my previous messages.

Thanks.

RudiC · May 12, 2016, 7:58am

There's a ^M (<carriage return>) too many that could be removed like

awk  '{while (length < 1330 && getline s>0) {sub (/\r/, ""); $0=$0 " " s}}1'  /tmp/Sample\ input\ file.txt

yielding exactly your desired output.
Please be aware that your sample input file's last line is missing the terminal ^M that all the other lines have.

mad_man · May 12, 2016, 8:23am

rudic:

There's a ^M (<carriage return>) too many that could be removed like
awk  '{while (length < 1330 && getline s>0) {sub (/\r/, ""); $0=$0 " " s}}1'  /tmp/Sample\ input\ file.txt
yielding exactly your desired output.
Please be aware that your sample input file's last line is missing the terminal ^M that all the other lines have.

Hi Rudi C,

Thanks for your reply i just ran it for the sample input file it worked well.
But if i mutiply the input records with the same scenario i gave erronous output.please find the attached i/p & o/p file.

RudiC · May 12, 2016, 8:46am

There's no ^M chars at EOL any more in the input file, and I don't think the two files in post#16 are correlated. Lines in the output are 2660 chars long...?

EDIT: Got it - due to the missing ^M lines are only 1329 chars. Use that value for comparison and you should come close.