Help needed in formatting the output

Hi All,

Need your help in resolving the below issue.

I've a file called "data.txt" with the below lines:

TT: <tell://me/sreenivas> 
 <tell://me/100>

<tell://me/500>

TT: <tell://me/sudheer>  
<tell://me/300>  

TT: <tell://me/sreenivas>  
<tell://me/200>  

TT: <tell://me/sudheer>  
<tell://me/400>

I need an output in the below format. Please help me.

TT: <tell://me/sreenivas> 
<tell://me/100> 
<tell://me/200>

<tell://me/500>

TT: <tell://me/sudheer> 
 <tell://me/300> 
<tell://me/400>

(or)

TT: <tell://me/sreenivas> 
<tell://me/100> 
<tell://me/200>

TT: <tell://me/sudheer>  
<tell://me/300> 
<tell://me/400>

<tell://me/500>

(or)

<tell://me/500>

TT: <tell://me/sreenivas> 
<tell://me/100> 
<tell://me/200>

TT: <tell://me/sudheer> 
 <tell://me/300> 
<tell://me/400>

Explanation of above o/p:
If the pattern between "<tell://me/" and ">" is same on any of the lines that contains "TT" then take only one line from them.
That line should be followed by the lines followed by the actual lines that have the same pattern between "<tell://me/" and ">".

And whenever a line is not preceded with TT line, that line should be taken as it is.

Looking forward to your help as soon as possible. Let me know if any queries.

With Regards,
Sree

Is this minor modification of the script supplied for your last thread sufficient?:

awk '
NF == 0 { next }
/^TT/ {	if(!((key = $1 FS $2) in out))
		out[key] = $0
	TTf = 1
	next
}
TTf {	out[key] = out[key] "\n" $0
	TTf = 0
	next
}
{	print $0 "\n"
}
END {	for(key in out)
		printf("%s\n\n", out[key])
}' data.txt

which produces the output:

<tell://me/500>

TT: <tell://me/sudheer>  
<tell://me/300>  
<tell://me/400>

TT: <tell://me/sreenivas> 
 <tell://me/100>
<tell://me/200>  

for the input you provided in this thread. It makes the assumption that a line starting with TT: will not be followed by a blank line. If that is a problem for you, please try modifying the script to remove that assumption and let us know how it works for you.

If preserving the order of the 1st occurrence of the TT: line values is important, that can be fixed too. But, since the TT: input line for a given value aren't adjacent in your input file, I assume the order is not important i your output file.

1 Like

Thanks DON for your help!

awk -F '\n' '$1 in a {a[$1] = (a[$1] FS $2); next} {a[$1] = $0} END {for(x in a) print a[x]}' RS= ORS='\n\n' file

When I try this with the sample input given in this thread, I get the output:

TT: <tell://me/sudheer>  
<tell://me/300>  
<tell://me/400>

TT: <tell://me/sreenivas>  
<tell://me/200>  

TT: <tell://me/sreenivas> 
 <tell://me/100>

<tell://me/500>

which doesn't seem to be even close to what was requested. How can taking pairs of input lines (without verifying that the 1st line in a pair starts with TT: ) do what was requested?

Hi Don, there are unwanted spaces at the end of the lines, that is effecting the command.
I would suggest you remove the unwanted spaces and try again

Hi SriniShoo,
Maybe your code thinks there are unwanted spaces, but the sample output provided by the submitter seems to want those trailing spaces that were present in the input to be there in the output as well.

If your code depends on modifying the given sample input, maybe you should warn readers that your code only works if the input is modified to meet your additional requirements (or modify your suggested code to massage the sample input provided by the submitter into the format your code requires) and explain that it does not provide the output requested by the submitter.

  • Don