Join lines with the same start string

andrejm · October 1, 2011, 7:09am

I have the text like:

DN11-001 Thats the first line which needs to be
DN11-001 joined with the second line and also to
DN11-001 the third line as they all begin with the same
DN11-001 document number.
DN11-002 The number of lines differ
DN11-002 among the documents.
DN11-005 It can also be just one line.

to be modified in a way that all text with the same start string is in one row:

DN11-001 Thats the first line which needs to be joined with the second line and also to the third line as they all begin with the same document number.
DN11-002 The number of lines differ among the documents.
DN11-005 It can also be just one line.

Thank you.
Andrej

ahamed101 · October 1, 2011, 7:52am

Try this...

 awk '{if(val==$1){gsub(val,"");printf $0}else{if(NR>1)print "";val=$1;printf $0}}END{print ""}' input_file

If in solaris, use nawk.

--ahamed

andrejm · October 1, 2011, 9:40am

I have Mac OS X and the code doesn't produce the expected result. Output looks the same as input

Thanks
Andrej

ltomuno · October 1, 2011, 9:52am

sed -n 's/DN[0-9]\{2\}-[0-9]\{3\} //p' input|awk '{ORS="";gsub("\\.",".\n\r");print}'

durden_tyler · October 1, 2011, 10:30am

$
$
$ cat f9
DN11-001 Thats the first line which needs to be
DN11-001 joined with the second line and also to
DN11-001 the third line as they all begin with the same
DN11-001 document number.
DN11-002 The number of lines differ
DN11-002 among the documents.
DN11-005 It can also be just one line.
$
$
$ perl -lne '/^(.*?) (.*)$/;
             if (! defined $x{$1}) {print $k,$v while ($k,$v)=each %x; %x=()}
             $x{$1}.=" $2";
             END {print $k,$v while ($k,$v)=each %x}' f9
DN11-001 Thats the first line which needs to be joined with the second line and also to the third line as they all begin with the same document number.
DN11-002 The number of lines differ among the documents.
DN11-005 It can also be just one line.
$
$
$

tyler_durden

andrejm · October 1, 2011, 10:56am

All three solutions work, thank you! The problem is that it works only on sample date, the real data obviously contains some special characters that need to be cleared. I guess I need to check which characters are not allowed?

Andrej

durden_tyler · October 1, 2011, 11:11am

Can you post an example of your real data?

tyler_durden

ravi_agarwalla · October 2, 2011, 1:29am

awk '{if(val==$1){gsub(val,"");printf $0}else{if(NR>1)print "";val=$1;printf $0}}END{print ""}' input_file

can someone explain the above command. how it will work

andrejm · October 2, 2011, 4:48am

Thank you, I thought nobody would care about it anymore! Attached is a sample of real data.

Andrej

ravi_agarwalla · October 2, 2011, 4:52am

Hi,

I want to know how the command will work with the above data.. it will be good if u explain the command and how it will join.

awk '{if(val==$1){gsub(val,"");printf $0}else{if(NR>1)print "";val=$1;printf $0}}END{print ""}' input_file

ahamed101 · October 2, 2011, 2:26pm

RealData.txt has \r. Use the below code to replace it with \n

sed -i 's/\r/\n/g' RealData.txt

And then execute the commands we have given!

--ahamed

---------- Post updated at 11:26 AM ---------- Previous update was at 11:24 AM ----------

btw, I have no Mac. I do all these in "back | track" -- Ubuntu Linux

--ahamed