Extracting XML Tag Contents

Hi Jean

I require your help in writing a shell script. Iam zero in Unix programming. I have a large file about 400 MB of data, which contains about 50000 XML messages seperated by a Tab, I think. I need to extract only 4 values from each XML message and write it onto a new file. Please help me with this.

Input File:

<logRequest xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:str="http://exslt.org/strings" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:secext="http://schemas.xmlsoap.org/ws/2002/04/secext" xmlns:rrbfunc="urn:fanta:bus:schemas:functions:1.0" xmlns:bus="urn:fanta:bus:schemas:context:1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:metrics20="urn:fanta:bus:schemas:metrics:2.0" xmlns:metrics10="urn:fanta:bus:schemas:metrics:1.0" xmlns:exsl="http://exslt.org/common" xmlns:endpt="urn:schemas.fantacom/bus/1.0/spInfo" xmlns:date="http://exslt.org/dates-and-times" xmlns:common="urn:fanta:bus:xslt:common.xsl" xmlns:cam="urn:fanta:comsec:authn:1.0"><logHeader><timestamp>2008-07-24T07:15:48.457000-04:00</timestamp><direction>response</direction><logType>SERVICE</logType></logHeader><logPayload><SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><Security xmlns="http://schemas.xmlsoap.org/ws/2002/04/secext"><wsse:BinarySecurityToken EncodingType="sentry:Base64Binary" ValueType="sentry:CSK1" cam:Username="262979139" cam:OpaqueId="262979139" xmlns:sentry="urn:fanta:sentry:schemas:security:1.2">pa1044iG3KjTWDx2DLRcQQliPVT8ryGTbVDOSP32NU4JSTP0k@</wsse:BinarySecurityToken></Security><context xmlns="urn:fanta:bus:schemas:context:1.0"><PilotRollout xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Region xmlns="">RZ_SAMS</Region></PilotRollout><channel xmlns:i="http://www.w3.org/2001/XMLSchema-instance">IO</channel><properties xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><property name="WebPartId">EmailAddressWebPart</property><property name="WebPartAction">Default</property><property name="CorrelatorId">54aee7c2-dbb7-49fe-b853-cbb2ea87ec7b</property><property name="AsyncCall"/></properties></context><currentCorrelId xmlns="urn:fanta:bus:schemas:metrics:1.0">e5befe1c-9b73-4586-830a-eff4cb485492</currentCorrelId><metrics10:point id="e710b65e-f483-4e21-bfbe-aac13a892fea" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="XX.XXX.XXX.XX" type="fantabus.intermediary"><metrics10:start>2008-07-24 11:15:47.919000 UTC</metrics10:start><metrics10:block>2008-07-24 11:15:48.418000 UTC</metrics10:block></metrics10:point><bus:point type="fantabus.provider" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="C2VTR6Z4" id="003425872549152C2VTR6Z45564001" xmlns:bus="urn:fanta:bus:schemas:metrics:1.0"><bus:start>2008-07-24 11:15:49.118778 UTC</bus:start><bus:block>2008-07-24 11:15:49.128023 UTC</bus:block><bus:unblock>2008-07-24 11:15:49.149758 UTC</bus:unblock><bus:stop>2008-07-24 11:15:49.152776 UTC</bus:stop></bus:point><endpt:spInfo><endpt:tranId>RRFQ</endpt:tranId><endpt:operation>GetPaperless</endpt:operation><endpt:TORName>C2VTR6Z4</endpt:TORName><endpt:AORName>C2VAR2Z7</endpt:AORName><endpt:taskNum>0090987</endpt:taskNum><endpt:UOWID>C2BD376A4B3D8E05</endpt:UOWID></endpt:spInfo></SOAP-ENV:Header></logPayload></logRequest>
<logRequest xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:str="http://exslt.org/strings" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:secext="http://schemas.xmlsoap.org/ws/2002/04/secext" xmlns:rrbfunc="urn:fanta:bus:schemas:functions:1.0" xmlns:bus="urn:fanta:bus:schemas:context:1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:metrics20="urn:fanta:bus:schemas:metrics:2.0" xmlns:metrics10="urn:fanta:bus:schemas:metrics:1.0" xmlns:exsl="http://exslt.org/common" xmlns:endpt="urn:schemas.fantacom/bus/1.0/spInfo" xmlns:date="http://exslt.org/dates-and-times" xmlns:common="urn:fanta:bus:xslt:common.xsl" xmlns:cam="urn:fanta:comsec:authn:1.0"><logHeader><timestamp>2008-07-24T07:15:48.457000-04:00</timestamp><direction>response</direction><logType>SERVICE</logType></logHeader><logPayload><SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><Security xmlns="http://schemas.xmlsoap.org/ws/2002/04/secext"><wsse:BinarySecurityToken EncodingType="sentry:Base64Binary" ValueType="sentry:CSK1" cam:Username="262979139" cam:OpaqueId="262979139" xmlns:sentry="urn:fanta:sentry:schemas:security:1.2">pa1044iG3KjTWDx2DLRcQQliPVT8ryGTbVDOSP32NU4JSTP0k@</wsse:BinarySecurityToken></Security><context xmlns="urn:fanta:bus:schemas:context:1.0"><PilotRollout xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Region xmlns="">RZ_SAMS</Region></PilotRollout><channel xmlns:i="http://www.w3.org/2001/XMLSchema-instance">IO</channel><properties xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><property name="WebPartId">EmailAddressWebPart</property><property name="WebPartAction">Default</property><property name="CorrelatorId">54aee7c2-dbb7-49fe-b853-cbb2ea87ec7b</property><property name="AsyncCall"/></properties></context><currentCorrelId xmlns="urn:fanta:bus:schemas:metrics:1.0">e5befe1c-9b73-4586-830a-eff4cb485492</currentCorrelId><metrics10:point id="e710b65e-f483-4e21-bfbe-aac13a892fea" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="XX.XXX.XXX.XXX" type="fantabus.intermediary"><metrics10:start>2008-07-24 11:15:47.919000 UTC</metrics10:start><metrics10:block>2008-07-24 11:15:48.418000 UTC</metrics10:block></metrics10:point><bus:point type="fantabus.provider" parent="e5befe1c-9b73-4586-830a-eff4cb485492" node="C2VTR6Z4" id="003425872549152C2VTR6Z45564001" xmlns:bus="urn:fanta:bus:schemas:metrics:1.0"><bus:start>2008-07-24 11:15:49.118778 UTC</bus:start><bus:block>2008-07-24 11:15:49.128023 UTC</bus:block><bus:unblock>2008-07-24 11:15:49.149758 UTC</bus:unblock><bus:stop>2008-07-24 11:15:49.152776 UTC</bus:stop></bus:point><endpt:spInfo><endpt:tranId>RRFQ</endpt:tranId><endpt:operation>GetPaperless</endpt:operation><endpt:TORName>C2VTR6Z4</endpt:TORName><endpt:AORName>C2VAR2Z7</endpt:AORName><endpt:taskNum>0090987</endpt:taskNum><endpt:UOWID>C2BD376A4B3D8E05</endpt:UOWID></endpt:spInfo></SOAP-ENV:Header></logPayload></logRequest>

Expected Output: output.txt containing the following

timestamp webpartID bus:block bus:unblock endpt:operation
2008-07-24T07:15:48.457000-04:00 EmailAddressWebPart 2008-07-24 11:15:49.128023 UTC 2008-07-24 11:15:49.149758 UTC GetPaperless
2008-07-24T07:15:48.457000-04:00 EmailAddressWebPart 2008-07-24 11:15:49.128023 UTC 2008-07-24 11:15:49.149758 UTC GetPaperless
perl>output.txt -nle'BEGIN { 
  $, = " "; 
  print "timestamp webpartID bus:block bus:unblock endpt:operation";
  }
  print 
    /timestamp>(.*?)<.*?
     "WebPartId">(.*?)<.*?
     bus:block>(.*?)<.*?
     bus:unblock>(.*?)<.*?
     endpt:operation>(.*?)<
    /x
' filename

I want to extract the value of Webpart ID. Please let me know where Iam going wrong.

egrep "<property name=\"WebPartId" /tmp/datapower/GetCustBankAccountsService.log | sed -e "s/^.*WebPartId"\" | cut -f2 -d">"| cut -f1 -d"<" > /tmp/temp1.xls