brunlea
January 30, 2013, 11:44am
1
Hi,
I have searched all over the forums for a problem similar to mine. I have found many but i have not been able to get them to work for me unfortunately!
What i am attempting to do is to extract part of a string from a line in a file. This line appears multiple times in the file. Example file:
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
I have tried using
sed 's/\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/' file
I expected this to remove all the information on a line apart from the AAA1234AA string. But it does not remove anything at all. Is there any way i extract the string only?
I'm quite inexperienced in unix scripting as you can probably tell. I'm on HP-UX.
Thanks
---------- Post updated at 11:44 AM ---------- Previous update was at 11:21 AM ----------
I have also tried
sed -n 's/.*\([A-Z]{3}[0-9]{4}[A-Z]{2}\).*/\1/p' file
but this has not worked either.
but if i try
sed -n 's/.*\(AAA1234AA\).*/\1/p' file
this brings back the desired result (but only relating to that specific string.
Any ideas?
Yoda
January 30, 2013, 11:47am
2
Using awk
awk -F'[<|>]' '{ print $8 }' filename
brunlea
January 30, 2013, 12:06pm
3
That does not work as i forgot to mention that there are other lines also in the file in the format
<m:string>zzz1234ZZ</m:string>
Is there a way that command can be adjusted to get the string bearing in mind the different lines in the file?
vbe
January 30, 2013, 12:08pm
4
Are they always situated at the same place (position in the line)?
Yoda
January 30, 2013, 12:10pm
5
Try this instead:
awk -F'[<|>]' '{for(i=1;i<=NF;i++) { if($i=="m:string") print $(i+1); }}' filename
sed 's/.*<m:string>//
s/<\/m:string>.*//' filename
awk -F "<(/)?m:string>" '{print $2}' file
gary_w
January 30, 2013, 1:50pm
8
How about sed where the string wanted is inside of m:string tags which are in a SOAP tag:
$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/<SOAP.*<m:string>\(.*\)<\/m:string>.*SOAP>/\1/p' x.dat
AAA1234AA
BBB1234BB
CCC1234CC
DDD1234DD
$
100 % sure this will solve ur proble , check this
cat file |awk -F ">" '{print $4}' |awk -F "<" '{print $1}'
brunlea
January 31, 2013, 5:57am
10
I have used this one. Works perfectly!
Thanks to all those who contributed.
gary_w
January 31, 2013, 9:42am
11
Ah, I believe I misunderstood you in what data you needed. For the sake of completeness then here is my fixed sed solution:
$ cat x.dat
<SOAP </m:req><m:body><m:string>AAA1234AA</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>BBB1234BB</m:string></m:body></m:req></SOAP>
<m:string>zzz1234ZZ</m:string>
<SOAP </m:req><m:body><m:string>CCC1234CC</m:string></m:body></m:req></SOAP>
<SOAP </m:req><m:body><m:string>DDD1234DD</m:string></m:body></m:req></SOAP>
$ sed -n 's/.*<m:string>\(.*\)<\/m:string>.*/\1/p' x.dat
AAA1234AA
BBB1234BB
zzz1234ZZ
CCC1234CC
DDD1234DD
$
1 Like
brunlea
January 31, 2013, 9:59am
12
gary_w:
Ah, I believe I misunderstood you in what data you needed. For the sake of completeness then here is my fixed sed solution:
$ sed -n 's/.*<m:string>\(.*\)<\/m:string>.*/\1/p' x.dat
That looks a lot cleaner to me and i understand it better! Thanks