Append specific lines to a previous line based on sequential search criteria

jesse · August 20, 2009, 2:58pm

I'll try explain this as best I can. Let me know if it is not clear.

I have large text files that contain data as such:

143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test
line 2 test
line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message

Every line in the file that starts with a 9 digit number (followed by a date / time and so on) is a unique message. In the example the first 3 lines are really 1 message (with 2 newlines in it).

This first 9 digit number increments sequentially.

What I want to do is get each message in it's entirety onto 1 line. So I *want* the file to look like:

143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test line 2 test line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message

Note that I'd like there to be a space between the additional lines in a single message.

My first idea was to remove ALL newlines from the file and replace them with spaces, and then work through that data inserting a newline after each of the sequence numbers.

I believe this will solve the problem but unfortunately I don't have the chops to pull it off. I'm sure there are also other, potentially better, ways of solving the problem.

One potential issue, I suppose, would be if one of the "extra" lines in a single message was miraculously the next 9 digit number in the sequence itself. I believe the chances of this would be pretty slim, probably to the extent of making this a moot concern for me at this point... but nonetheless it's something to consider.

Ideally I would like to do this with either perl or bash.

Thanks.

durden_tyler · August 20, 2009, 7:16pm

Here's one way to do it with Perl:

$ 
$ cat data.txt
143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test
line 2 test
line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message
$ 
$ 
$ perl -lne 'BEGIN{undef $/} chomp ($x=$_); $x=~s/\n/ /g; $x=~s/ (\d{9})/\n$1/g; $x=~s/\s+$//; print $x' data.txt
143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test line 2 test line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message
$ 
$

tyler_durden

summer_cherry · August 21, 2009, 1:49am

sed -n '/^[0-9]\{9\}/{
        1{h;}
        1!{
        	${x;s/\n//g;p;x;p;}
        	$!{x;s/\n//g;p;}
        }
        }
        /^[0-9]\{9\}/!{
        H
        }'

ranjithpr · August 21, 2009, 2:01am

Using awk

awk 'NR!=1 && $1~"[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" {printf("\n")} {printf("%s",$0)} END{printf("\n")}' file

Regards,

Ranjith