Extracting a string from a line

brunlea · January 30, 2013, 11:44am

Hi,

I have searched all over the forums for a problem similar to mine. I have found many but i have not been able to get them to work for me unfortunately!

What i am attempting to do is to extract part of a string from a line in a file. This line appears multiple times in the file. Example file:

<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>

I have tried using

sed 's/\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/' file

I expected this to remove all the information on a line apart from the AAA1234AA string. But it does not remove anything at all. Is there any way i extract the string only?

I'm quite inexperienced in unix scripting as you can probably tell. I'm on HP-UX.

Thanks

---------- Post updated at 11:44 AM ---------- Previous update was at 11:21 AM ----------

I have also tried

sed -n 's/.*\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/p' file

but this has not worked either.

but if i try

sed -n 's/.*\(AAA1234AA\).*/\1/p' file

this brings back the desired result (but only relating to that specific string.

Any ideas?

Yoda · January 30, 2013, 11:47am

Using awk

awk -F'[<|>]' '{ print $8 }' filename

brunlea · January 30, 2013, 12:06pm

That does not work as i forgot to mention that there are other lines also in the file in the format

<m:string>zzz1234ZZ</m:string>

Is there a way that command can be adjusted to get the string bearing in mind the different lines in the file?

vbe · January 30, 2013, 12:08pm

Are they always situated at the same place (position in the line)?

Yoda · January 30, 2013, 12:10pm

Try this instead:

awk -F'[<|>]' '{for(i=1;i<=NF;i++) { if($i=="m:string") print $(i+1); }}' filename

mstafreshi · January 30, 2013, 12:32pm

sed 's/.*<m:string>// 
        s/<\/m:string>.*//'  filename

shamrock · January 30, 2013, 1:22pm

awk -F "<(/)?m:string>" '{print $2}' file

gary_w · January 30, 2013, 1:50pm

How about sed where the string wanted is inside of m:string tags which are in a SOAP tag:

$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/<SOAP.*<m:string>\(.*\)<\/m:string>.*SOAP>/\1/p' x.dat
AAA1234AA
BBB1234BB
CCC1234CC
DDD1234DD
$

Ashish_Rathour · January 30, 2013, 2:32pm

100 % sure this will solve ur proble , check this

cat file |awk -F ">" '{print $4}' |awk -F "<" '{print $1}'

brunlea · January 31, 2013, 5:57am

I have used this one. Works perfectly!

Thanks to all those who contributed.

gary_w · January 31, 2013, 9:42am

Ah, I believe I misunderstood you in what data you needed. For the sake of completeness then here is my fixed sed solution:

 
$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/.*<m:string>\(.*\)<\/m:string>.*/\1/p' x.dat
AAA1234AA
BBB1234BB
zzz1234ZZ
CCC1234CC
DDD1234DD
$

brunlea · January 31, 2013, 9:59am

That looks a lot cleaner to me and i understand it better! Thanks