Extract part of word from XML

Hi All,

Can Someone help me in capturing a word from xml Using sed or awk or any other way in unix.

i have file abc.xml like this

<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?> 
- <NREC xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <HEADER>
  <SOURCE>MAIL</SOURCE> 
  <MSGID>7971</MSGID> 
  <MESSAGETYPE>NRECAB</MESSAGETYPE> 
  <MESSAGEFUNCTION>NEWM</MESSAGEFUNCTION> 
  <REVID>30842</REVID> 
  <MARKET>US</MARKET> 
  <PROCESSINGDESK>01</PROCESSINGDESK> 
  <PREPDATE>20120123012446</PREPDATE> 
  </HEADER>

when i ftp'ed this file to unix it became single line like

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?><NREC xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><HEADER><SOURCE>MAIL</SOURCE><MSGID>797</MSGID><MESSAGETYPE>NRECAB</MESSAGETYPE><MESSAGEFUNCTION>NEWM</MESSAGEFUNCTION><REVID>30842</REVID><MARKET>US</MARKET><PROCESSINGDESK>01</PROCESSINGDESK><PREPDATE>20120123012446</PREPDATE></HEADER>

Now my requirement is to extract only " NRECAB" and "US" .

Please help me capture this particular words single or in separate commands.

Thanks in Advance,
Naveen kumar c

$ perl -ne 'if(/<MESSAGETYPE|<MARKET/){s/.*?>(.*?)<.*/$1/; print}' abc.xml
NRECAB
US

Sorry Balajesri ,This command is not working

awk -F"[<> ]" '{for(i=1;i<=NF;i++){if($i~"MESSAGETYPE|MARKET"){if($(i+1)){print $(i+1)}}}}'  infile

--ahamed

1 Like

@naveenkumarc: Please provide more details. What happens when you give that perl one-liner?

@Balajesuri: $ perl -ne 'if(/<MESSAGETYPE|<MARKET/){s/.*?>(.*?)<.*/$1/; print}' abc.xml

$

--No output.--

---------- Post updated at 08:50 AM ---------- Previous update was at 08:47 AM ----------

Thanks ahamed,
Its working , can u explain me please , how above command is working.?