Extracting a string from a line

Hi,

I have searched all over the forums for a problem similar to mine. I have found many but i have not been able to get them to work for me unfortunately!

What i am attempting to do is to extract part of a string from a line in a file. This line appears multiple times in the file. Example file:

<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>

I have tried using

sed 's/\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/' file

I expected this to remove all the information on a line apart from the AAA1234AA string. But it does not remove anything at all. Is there any way i extract the string only?

I'm quite inexperienced in unix scripting as you can probably tell. I'm on HP-UX.

Thanks

---------- Post updated at 11:44 AM ---------- Previous update was at 11:21 AM ----------

I have also tried

sed -n 's/.*\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/p' file

but this has not worked either.

but if i try

sed -n 's/.*\(AAA1234AA\).*/\1/p' file

this brings back the desired result (but only relating to that specific string.

Any ideas?

Using awk

awk -F'[<|>]' '{ print $8 }' filename

That does not work as i forgot to mention that there are other lines also in the file in the format

<m:string>zzz1234ZZ</m:string>

Is there a way that command can be adjusted to get the string bearing in mind the different lines in the file?

Are they always situated at the same place (position in the line)?

Try this instead:

awk -F'[<|>]' '{for(i=1;i<=NF;i++) { if($i=="m:string") print $(i+1); }}' filename
sed 's/.*<m:string>// 
        s/<\/m:string>.*//'  filename
awk -F "<(/)?m:string>" '{print $2}' file

How about sed where the string wanted is inside of m:string tags which are in a SOAP tag:

$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/<SOAP.*<m:string>\(.*\)<\/m:string>.*SOAP>/\1/p' x.dat
AAA1234AA
BBB1234BB
CCC1234CC
DDD1234DD
$

100 % sure this will solve ur proble , check this

cat file |awk -F ">" '{print $4}' |awk -F "<" '{print $1}'

I have used this one. Works perfectly!

Thanks to all those who contributed.

Ah, I believe I misunderstood you in what data you needed. For the sake of completeness then here is my fixed sed solution:

 
$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/.*<m:string>\(.*\)<\/m:string>.*/\1/p' x.dat
AAA1234AA
BBB1234BB
zzz1234ZZ
CCC1234CC
DDD1234DD
$
 
1 Like

That looks a lot cleaner to me and i understand it better! Thanks