extract specific data from xml format file.

Hi,

I need to extract the start time value (bold, red font) under the '<LogEvent ID="Timer Start">' tag (black bold) from a file with the following pattern. There are other LogEventIDs listed in the file as well, making it harder for me to extract out the specific start time that I need.

.
.
.
</LogEvent>
<LogEvent ID="Timer Start">
<LogEventProperty Name="Event Type" ID="Event Type"><![CDATA[Timer Start]]></LogEventProperty>
<LogEventProperty Name="Start Date" ID="Start Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="Start Time" ID="Start Time"><![CDATA[19:04:52]]></LogEventProperty>
<LogEventProperty Name="End Date" ID="End Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="End Time" ID="End Time"><![CDATA[19:04:52]]></LogEventProperty>
<LogEventProperty Name="Result" ID="Result"><![CDATA[ ]]></LogEventProperty>
<LogEventProperty Name="Failure Reason" ID="Failure Reason"><![CDATA[ ]]></LogEventProperty>
<LogEventProperty Name="Failure Description" ID="Failure Description"></LogEventProperty>
<LogEventProperty Name="Virtual Tester Command ID" ID="Virtual Tester Command ID"><![CDATA[SUBMIT MESSAGES1RCF/MH541/10JUL0700/FRA/]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester First Time Stamp" ID="Virtual Tester First Time Stamp"><![CDATA[74718]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester Line Number" ID="Virtual Tester Line Number"><![CDATA[1084]]></LogEventProperty>
<LogEventProperty Name="Operating System" ID="Operating System"><![CDATA[Windows]]></LogEventProperty>
<LogEventProperty Name="OS Version" ID="OS Version"><![CDATA[2003]]></LogEventProperty>
<LogEventProperty Name="Processor" ID="Processor"><![CDATA[Pentium]]></LogEventProperty>
<LogEventProperty Name="Display Resolution" ID="Display Resolution"><![CDATA[1024x768]]></LogEventProperty>
<LogEventProperty Name="Display Color Bits" ID="Display Color Bits"><![CDATA[32]]></LogEventProperty>
<LogEventProperty Name="OS Service Pack" ID="OS Service Pack"><![CDATA[Service Pack 1]]></LogEventProperty>
<LogEventProperty Name="Memory Size" ID="Memory Size"><![CDATA[2048]]></LogEventProperty>
<LogEventProperty Name="Processor Number" ID="Processor Number"><![CDATA[4]]></LogEventProperty>
</LogEvent>
<LogEvent ID="Emulation Command">
<LogEventProperty Name="Event Type" ID="Event Type"><![CDATA[Emulation Command]]></LogEventProperty>
<LogEventProperty Name="Start Date" ID="Start Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="Start Time" ID="Start Time"><![CDATA[19:04:54]]></LogEventProperty>
<LogEventProperty Name="End Date" ID="End Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="End Time" ID="End Time"><![CDATA[19:04:54]]></LogEventProperty>
<LogEventProperty Name="Result" ID="Result"><![CDATA[Pass]]></LogEventProperty>
<LogEventProperty Name="Failure Reason" ID="Failure Reason"><![CDATA[ ]]></LogEventProperty>
<LogEventProperty Name="Failure Description" ID="Failure Description"></LogEventProperty>
<LogEventProperty Name="Virtual Tester Command ID" ID="Virtual Tester Command ID"><![CDATA[FSUBase~064]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester First Time Stamp" ID="Virtual Tester First Time Stamp"><![CDATA[76093]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester Last Time Stamp" ID="Virtual Tester Last Time Stamp"><![CDATA[76109]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester Line Number" ID="Virtual Tester Line Number"><![CDATA[1088]]></LogEventProperty>
<LogEventProperty Name="Operating System" ID="Operating System"><![CDATA[Windows]]></LogEventProperty>
<LogEventProperty Name="OS Version" ID="OS Version"><![CDATA[2003]]></LogEventProperty>
<LogEventProperty Name="Processor" ID="Processor"><![CDATA[Pentium]]></LogEventProperty>
<LogEventProperty Name="Display Resolution" ID="Display Resolution"><![CDATA[1024x768]]></LogEventProperty>
<LogEventProperty Name="Display Color Bits" ID="Display Color Bits"><![CDATA[32]]></LogEventProperty>
<LogEventProperty Name="OS Service Pack" ID="OS Service Pack"><![CDATA[Service Pack 1]]></LogEventProperty>
<LogEventProperty Name="Memory Size" ID="Memory Size"><![CDATA[2048]]></LogEventProperty>
<LogEventProperty Name="Processor Number" ID="Processor Number"><![CDATA[4]]></LogEventProperty>
</LogEvent>
<LogEvent ID="Env Variable Change">
<LogEventProperty Name="Event Type" ID="Event Type"><![CDATA[Env Variable Change]]></LogEventProperty>
.
.
.

I am currently trying to do it using commands like awk, sed and grep.. but since time is running out (and am not really good in scripting), I appreciate any help offered for me to complete this task.

Thanks in advance.

Some points to add..

There are 3 other Log Event IDs that contains the word 'Start':

[root@sysh /home/share/gfsu]# more firstRun1_1905_131207.rtpar |grep "LogEvent ID" | grep Start | more
<LogEvent ID="Suite Start">
<LogEvent ID="User Start">
<LogEvent ID="Script Start">
<LogEvent ID="Timer Start">

Uhmm.. forgot another thing.. I will only need to take the start time if the status is 'send messages', and not 'submit messages'.

Here's another portion from the file:

<LogEvent ID="Timer Start">
<LogEventProperty Name="Event Type" ID="Event Type"><![CDATA[Timer Start]]></LogEventProperty>
<LogEventProperty Name="Start Date" ID="Start Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="Start Time" ID="Start Time"><![CDATA[19:04:46]]></LogEventProperty>
<LogEventProperty Name="End Date" ID="End Date"><![CDATA[12/13/2007]]></LogEventProperty>
<LogEventProperty Name="End Time" ID="End Time"><![CDATA[19:04:46]]></LogEventProperty>
<LogEventProperty Name="Result" ID="Result"><![CDATA[ ]]></LogEventProperty>
<LogEventProperty Name="Failure Reason" ID="Failure Reason"><![CDATA[ ]]></LogEventProperty>
<LogEventProperty Name="Failure Description" ID="Failure Description"></LogEventProperty>
<LogEventProperty Name="Virtual Tester Command ID" ID="Virtual Tester Command ID"><![CDATA[send messagesRCF/MH541/10JUL0700/FRA/T20]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester First Time Stamp" ID="Virtual Tester First Time Stamp"><![CDATA[68798]]></LogEventProperty>
<LogEventProperty Name="Virtual Tester Line Number" ID="Virtual Tester Line Number"><![CDATA[1177]]></LogEventProperty>
<LogEventProperty Name="Operating System" ID="Operating System"><![CDATA[Windows]]></LogEventProperty>
<LogEventProperty Name="OS Version" ID="OS Version"><![CDATA[2003]]></LogEventProperty>
<LogEventProperty Name="Processor" ID="Processor"><![CDATA[Pentium]]></LogEventProperty>
<LogEventProperty Name="Display Resolution" ID="Display Resolution"><![CDATA[1024x768]]></LogEventProperty>
<LogEventProperty Name="Display Color Bits" ID="Display Color Bits"><![CDATA[32]]></LogEventProperty>
<LogEventProperty Name="OS Service Pack" ID="OS Service Pack"><![CDATA[Service Pack 1]]></LogEventProperty>
<LogEventProperty Name="Memory Size" ID="Memory Size"><![CDATA[2048]]></LogEventProperty>
<LogEventProperty Name="Processor Number" ID="Processor Number"><![CDATA[4]]></LogEventProperty>
</LogEvent>

Hmm.. this is not good

Based on the example posted above try this,

sed -n '/Start Time/s/^.*CDATA\[\(.*\)]].*$/\1/p' filename

but there are 2 records which match the criteria

Yea.. just got the stuff in the morning, and I only have merely 12hours before submitting the report. Gah.

Thanks for your reply, but your code is nice enough to extract out the time, and I won't be needing all. Need it to match these 2 cases:

1) LogEvent ID ="Timer Start"
2) <![CDATA[send messages

What if I just want to extract the whole LogEvent ID="Timer Start"?

From <LogEvent ID="Timer Start"> to </LogEvent>.

awk '/LogEvent ID="Timer Start"/ {print} BEGIN {i=0} /LogEvent ID="Timer Start"/ {while (i<20) {getline; print $0; i++} i=0}' firstRun1_1905_131207.rtpar

Let me know if there's better solution.

Thanks.

So uhmm.. I only have to extract the time (i.e. 19:04:46) which is located few lines before the pattern 'send messages'. Am trying couple of commands but it doesn't work :confused:

Help, anyone?

Thanks.

Thanks to this thread, awk: need to extract a line before a pattern I finally managed to extract out the data that I wanted.

awk '$0~/Start Time/{lineno=NR; text=$0} NR==lineno+6 && $0~/send messages/{print text}' filename

That will extract out the Start Time if line 6 contains 'send messages'.

Using matrixmadhan's code :

sed -n '/Start Time/s/^.*CDATA\[\(.*\)]].*$/\1/p' filename

the timestamp is extracted.

Thanks! :stuck_out_tongue: