search a long line

Hey all, I need to retrieve something from a line, say

<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST><TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>

I need to match <ID>111</ID>, so I want to retrieve

<ID>1111</ID><C>789</C><D>000</D></TEST>

is this possible, can anyone help? Thank you!

sed "s;\(</TEST>\)\(<TEST>\);\1\\
\2;g" f | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p" 

I don't quite get what exactly that command does, do you mind explain it a little? Thanks!

if you have Python and know the language, here's an alternative:

#!/usr/bin/python
import sys
choice=sys.argv[1]
for line in open("file"):
     for li in line.split("<TEST>"):	
	  if "<ID>%s</ID>" % choice in li:
	       ind = li.index("<ID>1111</ID>")
	       print li[ind:] 

output:

# ./test.py 1111
<ID>1111</ID><C>789</C><D>000</D></TEST>
$ cat file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST><TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>

First sed separates TEST tags into separate lines

$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST>
<TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>

Second sed matches <ID>1</ID> to till end of the line
1\{1,\} matches from one 1 to n number of 1s.
for example matches 1, 11, 111, 1111 and so on

$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p"
<ID>1111</ID><C>789</C><D>000</D></TEST>

but my ID is random, say 28654..then what should I do?

id="28654"
sed "s;\(</TEST>\)\(<TEST>\);\1\\
\2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"

it doesn't work...nothing returned

check whether your input contains 28654?

yes, I did

b=1111
sed 's/\(.*<\/TEST>\)<TEST>\(.*<\/TEST>\)/\1\
<TEST>\2/' filename | sed -ne "/<ID>$b/s/^.*<\/B>\s*//p"

one more with ' \s ' :slight_smile:

$ cat file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST><TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>
$ id="1111"
$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"
<ID>1111</ID><C>789</C><D>000</D></TEST>
$ id="2222"
$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"
<ID>2222</ID><C>789</C><D>000</D></TEST>

I tried with your sample and its working.
Can you show your input?

Am sorry if am wrong,

are you trying with the following code posted earlier

sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p"

then it would nt work for values specified through variable

Could you please check that ? :slight_smile:

thanks everyone, especially anbu23!

My input is a very big xml file with foreign language characters, but I would assume it doesn't make much different. Structure of the xml is just as I described, I would spend more time to check whether I miss any slash or space on the command.

Although things are exactly working out right yet, but I know which direction I should look into, I will read upon "sed", thanks again!

yes, i am, I simply copy and paste (in case I mis-type anything), why would be work?