Help needed in formatting the Output file

raosr020 · August 28, 2014, 2:47pm

Hi All,

Need your help in resolving the below issue.

I've a file called "data.txt" with the below lines:

TT: <tell://me/sreenivas> 
<tell://me/100>

TT: <tell://me/sudheer> 
<tell://me/300>

TT: <tell://me/sreenivas> 
<tell://me/200>

TT: <tell://me/sudheer> 
<tell://me/400>

I want an output in the below format. Please help me.

TT: <tell://me/sreenivas>
<tell://me/100>
<tell://me/200>

TT: <tell://me/sudheer> 
<tell://me/300>
<tell://me/400>

Explanation of above o/p:
If the pattern between "<tell://me/" and ">" is same on any of the lines that contains "TT" then take only one line from them.
That line should be followed by the lines followed by the actual lines that have the same pattern between "<tell://me/" and ">".

Looking forward to your help as soon as possible. Let me know if any queries.

With Regards,
SRK

bakunin · August 28, 2014, 3:05pm

Is your input always guaranteed to consist of groups of 2 lines, one starting with "TT:" and the other with a "<tell..."clause?

If so, create a sort-of table with the 2-line groups brought to one line, like this:

TT: <tell://me/sreenivas> <tell://me/100>
TT: <tell://me/sudheer> <tell://me/300>

You can do this easily with a single sed-line. Sorting this will give you all equal keys following each other. Last step is to do a "control break", which is a basic algorithm in programming. Here is shown how to do this.

I hope this helps.

bakunin

Don_Cragun · August 28, 2014, 3:41pm

This is also pretty easy with awk :

awk '
NF == 0 { next }
/^TT/ {	if(!((key = $1 FS $2) in out))
		out[key] = $0
	next
}
{	out[key] = out[key] "\n" $0 }
END {	for(key in out)
		printf("%s\n\n", out[key])
}' data.txt

This will work even if there are multiple non-blank lines between the lines starting with TT: .

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

raosr020 · August 30, 2014, 3:25pm

Thanks Don. It helped me a lot!