XML Log Parsing

I have a log file that is around 300 MB of data having continours soap responses as shown below( I have attached only one sample SOAP). I would require to have the following extracted and written onto a new file.

timestamp
WebPartId
bus:block
bus:unblock
endpt:operation

Please help me.

<logRequest xmlns:wsse="http://docs.ddoasis-open.org/wss/2004/01/oasis-200401-wss-1.0.xsd" xmlns:str="http://exslt.org/strings" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:secext="WS-Security" xmlns:rrbfunc="urn:schemas:functions:1.0" xmlns:rrbus="urn:schemas:context:1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:metrics20="urn:metrics:2.0" xmlns:metrics10="urn:metrics:1.0" xmlns:exsl="http://exslt.org/common" xmlns:endpt="urn:schemas.bcom/rrbus/1.0/spInfo" xmlns:date="http://exslt.org/dates-and-times" xmlns:common="urn::xslt:common.xsl" xmlns:cam="urn:comsec:authn:1.0"><logHeader><timestamp>2008-07-24T04:17:04.137000-04:00</timestamp><direction>response</direction><logType>SERVICE</logType></logHeader><logPayload><SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><Security xmlns="WS-Security"><wsse:BinarySecurityToken EncodingType="sentry:Base64Binary" ValueType="sentry:CSK1" cam:Username="639025903" cam:OpaqueId="639025903" xmlns:sentry="urn::schemas:security:1.2">pa1044zV0vIpymMC5uPnnpGlsT-aTye3EX0@</wsse:BinarySecurityToken></Security><context xmlns="urn:schemas:context:1.0"><PilotRollout xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Region xmlns="">R_SAS</Region></PilotRollout><channel xmlns:i="http://www.w3.org/2001/XMLSchema-instance">IO</channel><properties xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><property name="WebPartId">AcctBalances</property><property name="WebPartAction">Default</property><property name="CorrelatorId">54057-d7d7e997a6c7</property><property name="AsyncCall"/></properties></context><currentCorrelId xmlns="urn:hemas:metrics:1.0">7e36a2fc-2cd3-43af-9ca4ae8f</currentCorrelId><metrics10:point id="873e10cd-3691-4e3b4f190" parent="7e36a29-a31475a4ae8f" node="10.14.56.59" type="srmediary"><metrics10:start>2008-07-24 08:17:03.321000 UTC</metrics10:start><metrics10:block>2008-07-24 08:17:03.520000 UTC</metrics10:block></metrics10:point><bus:point type=us.provider" parent="7e36a2fc-2cd31475a4ae8f" node="C1VTR6" id="003425861VTR6Z30818001" xmlns:rrbus="urn:schemas:metrics:1.0"><bus:start>2008-07-24 08:17:02.978203 UTC</bus:start><bus:block>2008-07-24 08:17:03.242739 UTC</bus:block><bus:unblock>2008-07-24 08:17:03.583753 UTC</bus:unblock><bus:stop>2008-07-24 08:17:03.588468 UTC</bus:stop></bus:point><endpt:spInfo><endpt:tranId>RRXI</endpt:tranId><endpt:operation>GetThirdPartyAcctInfo</endpt:operation><endpt:TORName>C1VTRZ3</endpt:TORName><endpt:AORName>C1VA2Z9</endpt:AORName><endpt:taskNum>000690</endpt:taskNum><endpt:UOWID>C2BF750D1B3783</endpt:UOWID></endpt:spInfo></SOAP-ENV:Header></logPayload></logRequest>

Expected output:

timestamp WebPartId bus:block bus:unblock endpt:operation
2008-07-24T04:17:04.137000-04:00 AcctBalances 2008-07-24 08:17:03.321000 2008-07-24 08:17:03.421000 GetThirdPartyAcctInfo

The best way to handle this is to write an XML stylesheet to transform the XML "document" into the desired output. (BTW, the provided XML is not valid, the "bus" namspace is not defined.)

Another way, if you want to stick to using UNIX utilities, is to convert the the XML into PYX format (there are a number of tools available e.e. xmlstarlet, xmln, etc., do a Web search for PYX) and use sed, awk or grep to extract the relevant information

For example here is the equivalant PYX for the (corrected to make valid) XML you provided

(logRequest
Axmlns:exsl http://exslt.org/common
Axmlns:endpt urn:schemas.bcom/rrbus/1.0/spInfo
Axmlns:date http://exslt.org/dates-and-times
Axmlns:common urn::xslt:common.xsl
Axmlns:bus urn::fpmurphy
Axmlns:cam urn:comsec:authn:1.0
Axmlns:wsse http://docs.ddoasis-open.org/wss/2004/01/oasis-200401-wss-1.0.xsd
Axmlns:str http://exslt.org/strings
Axmlns:soapenv http://schemas.xmlsoap.org/soap/envelope/
Axmlns:secext http://schemas.xmlsoap.org/ws/2002/04/secext
Axmlns:rrbfunc urn:schemas:functions:1.0
Axmlns:rrbus urn:schemas:context:1.0
Axmlns:regexp http://exslt.org/regular-expressions
Axmlns:metrics20 urn:metrics:2.0
Axmlns:metrics10 urn:metrics:1.0
-\n
(logHeader
-\n
(timestamp
-2008-0724T04:17:04.13700004:00
)timestamp
-\n
(direction
-response
)direction
-\n
(logType
-SERVICE
)logType
-\n
)logHeader
-\n
(logPayload
-\n
(SOAP-ENV:Header
Axmlns:SOAP-ENV http://schemas.xmlsoap.org/soap/envelope/
Axmlns:s http://schemas.xmlsoap.org/soap/envelope/
-\n
(Security
Axmlns http://schemas.xmlsoap.org/ws/2002/04/secext
-\n
(wsse:BinarySecurityToken
AEncodingType sentry:Base64Binary
AValueType sentry:CSK1
Acam:Username 639025903
Acam:OpaqueId 639025903
Axmlns:sentry urn::schemas:security:1.2
-\n                        pa1044zV0vIpymMC5uPnnpGlsT-aTye3EX0@\n
)wsse:BinarySecurityToken
-\n
)Security
-\n
(context
Axmlns urn:schemas:context:1.0
-\n
(PilotRollout
Axmlns:i http://www.w3.org/2001/XMLSchema-instance
-\n
(Region
Axmlns
-R_SAS
)Region
-\n
)PilotRollout
-\n
(channel
Axmlns:i http://www.w3.org/2001/XMLSchema-instance
-IO
)channel
-\n
(properties
Axmlns:i http://www.w3.org/2001/XMLSchema-instance
-\n
(property
Aname WebPartId
-AcctBalances
)property
-\n
(property
Aname WebPartAction
-Default
)property
-\n
(property
Aname CorrelatorId
-54057-d7d7e997a6c7
)property
-\n
(property
Aname AsyncCall
)property
-\n
)properties
-\n
)context
-\n
(currentCorrelId
Axmlns urn:hemas:metrics:1.0
-7e36a2fc-2cd3-43af-9ca4ae8f
)currentCorrelId
-\n
(metrics10oint
Aid 873e10cd-3691-4e3b4f190
Aparent 7e36a29-a31475a4ae8f
Anode 10.14.56.59
Atype srmediary
-\n
(metrics10:start
-2008-07-24 08:17:03.321000 UTC
)metrics10:start
-\n
(metrics10:block
-2008-07-24 08:17:03.520000 UTC
)metrics10:block
-\n
)metrics10oint
-\n
(bus:point
Aid 003425861VTR6Z30818001
Aparent 7e36a2fc-2cd31475a4ae8f
Anode C1VTR6
Atype us.provider
Axmlns:rrbus urn:schemas:metrics:1.0
-\n
(bus:start
-2008-07-24 08:17:02.978203 UTC
)bus:start
-\n
(bus:block
-2008-07-24 08:17:03.242739 UTC
)bus:block
-\n
(bus:unblock
-2008-07-24 08:17:03.583753 UTC
)bus:unblock
-\n
(bus:stop
-2008-07-24 08:17:03.588468 UTC
)bus:stop
-\n
)bus:point
-\n
(endpt:spInfo
-\n
(endpt:tranId
-RRXI
)endpt:tranId
-\n
(endpt:operation
-GetThirdPartyAcctInfo
)endpt:operation
-\n
(endpt:TORName
-C1VTRZ3
)endpt:TORName
-\n
(endpt:AORName
-C1VA2Z9
)endpt:AORName
-\n
(endpt:taskNum
-000690
)endpt:taskNum
-\n
(endpt:UOWID
-C2BF750D1B3783
)endpt:UOWID
-\n
)endpt:spInfo
-\n
)SOAP-ENV:Header
-\n
)logPayload
-\n
)logRequest

and here is an example of one way of extracting the first three pieces of information that you are looking for:

$ sed -n -e '/(timestamp/{n;s/^-//p;}' -e '/Aname WebPartId/{n;s/^-//p;}' -e '/(bus:block/{n;s/^-//p;}' pyxfile
2008-0724T04:17:04.13700004:00
AcctBalances
2008-07-24 08:17:03.242739 UTC
$

help me with this using Unix commands?