search a long line

mpang · March 21, 2007, 5:14am

Hey all, I need to retrieve something from a line, say

I need to match <ID>111</ID>, so I want to retrieve

is this possible, can anyone help? Thank you!

anbu23 · March 21, 2007, 5:38am

sed "s;\(</TEST>\)\(<TEST>\);\1\\
\2;g" f | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p"

mpang · March 21, 2007, 5:50am

I don't quite get what exactly that command does, do you mind explain it a little? Thanks!

ghostdog74 · March 21, 2007, 5:55am

if you have Python and know the language, here's an alternative:

#!/usr/bin/python
import sys
choice=sys.argv[1]
for line in open("file"):
     for li in line.split("<TEST>"):	
	  if "<ID>%s</ID>" % choice in li:
	       ind = li.index("<ID>1111</ID>")
	       print li[ind:]

output:

# ./test.py 1111
<ID>1111</ID><C>789</C><D>000</D></TEST>

anbu23 · March 21, 2007, 6:10am

$ cat file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST><TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>

First sed separates TEST tags into separate lines

$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST>
<TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>

Second sed matches <ID>1</ID> to till end of the line
1\{1,\} matches from one 1 to n number of 1s.
for example matches 1, 11, 111, 1111 and so on

$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p"
<ID>1111</ID><C>789</C><D>000</D></TEST>

mpang · March 21, 2007, 6:32am

but my ID is random, say 28654..then what should I do?

anbu23 · March 21, 2007, 6:36am

id="28654"
sed "s;\(</TEST>\)\(<TEST>\);\1\\
\2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"

mpang · March 21, 2007, 6:58am

it doesn't work...nothing returned

anbu23 · March 21, 2007, 7:09am

check whether your input contains 28654?

mpang · March 21, 2007, 7:31am

yes, I did

matrixmadhan · March 21, 2007, 7:38am

b=1111

sed 's/\(.*<\/TEST>\)<TEST>\(.*<\/TEST>\)/\1\
<TEST>\2/' filename | sed -ne "/<ID>$b/s/^.*<\/B>\s*//p"

one more with ' \s '

anbu23 · March 21, 2007, 7:41am

$ cat file
<TEST><A>123</A><B>456</B><ID>1111</ID><C>789</C><D>000</D></TEST><TEST><A>123</A><B>456</B><ID>2222</ID><C>789</C><D>000</D></TEST>
$ id="1111"
$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"
<ID>1111</ID><C>789</C><D>000</D></TEST>
$ id="2222"
$ sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>${id}.*\)/\1/p"
<ID>2222</ID><C>789</C><D>000</D></TEST>

I tried with your sample and its working.
Can you show your input?

matrixmadhan · March 21, 2007, 7:45am

Am sorry if am wrong,

are you trying with the following code posted earlier

sed "s;\(</TEST>\)\(<TEST>\);\1\\
> \2;g" file | sed -n "s/.*\(<ID>1\{1,\}.*\)/\1/p"

then it would nt work for values specified through variable

Could you please check that ?

mpang · March 21, 2007, 7:55am

thanks everyone, especially anbu23!

My input is a very big xml file with foreign language characters, but I would assume it doesn't make much different. Structure of the xml is just as I described, I would spend more time to check whether I miss any slash or space on the command.

Although things are exactly working out right yet, but I know which direction I should look into, I will read upon "sed", thanks again!

mpang · March 21, 2007, 7:56am

yes, i am, I simply copy and paste (in case I mis-type anything), why would be work?