Extract strings from multiple lines into one csv file

Hi all,

Please go through my requirement.
I have a log file in the location /opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch
This file contains the follwing pattern data

<af type="tenured" id="42" timestamp="May 14 13:44:13 2011" intervalms="955.624">
<minimum requested_bytes="200" />
<time exclusiveaccessms="0.055" />
<tenured freebytes="0" totalbytes="2147483648" percent="0" >
<soa freebytes="0" totalbytes="2147483648" percent="0" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<gc type="global" id="42" totalid="42" intervalms="956.093">
<refs_cleared soft="2665" threshold="32" weak="341" phantom="6" />
<finalization objectsqueued="109" />
<timesms mark="661.941" sweep="17.140" compact="0.000" total="679.629" />
<tenured freebytes="227799624" totalbytes="2147483648" percent="10" >
<soa freebytes="227799624" totalbytes="2147483648" percent="10" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
</gc>
<tenured freebytes="227799080" totalbytes="2147483648" percent="10" >
<soa freebytes="227799080" totalbytes="2147483648" percent="10" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<time totalms="680.097" />
</af>
<af type="tenured" id="43" timestamp="May 14 13:50:13 2011" intervalms="9230.118">
<minimum requested_bytes="200" />
<time exclusiveaccessms="0.055" />
<tenured freebytes="0" totalbytes="2147483648" percent="0" >
<soa freebytes="0" totalbytes="2147483648" percent="0" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<gc type="global" id="42" totalid="42" intervalms="956.093">
<refs_cleared soft="2665" threshold="32" weak="341" phantom="6" />
<finalization objectsqueued="109" />
<timesms mark="661.941" sweep="17.140" compact="0.000" total="679.629" />
<tenured freebytes="227799624" totalbytes="2147483648" percent="10" >
<soa freebytes="227799624" totalbytes="2147483648" percent="10" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
</gc>
<tenured freebytes="227799080" totalbytes="2147483648" percent="13" >
<soa freebytes="227799080" totalbytes="2147483648" percent="10" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<time totalms="8540.01" />
</af>

Based on the above data I need to generate a csv file which is shown below.
Desired Output:

hostname, gc type, datetime, duration, location, intervalms, percent
tkdp2csmftp50, af, May 14 13:44:13 2011, 0.680097, EMQbatchprofile, .955624, 10
tkdp2csmftp50, af, May 14 13:50:13 2011, 8.54001, EMQbatchprofile, 9.230118, 13

In the desired output tkdp2csmftp50 is the hostname and the timings (Fields duration and intervalms)which are in millisecs needs to be converted in secs (divided by thousand) and location should be the from the directory path given above (/opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch)

Assuming the sample data you have given, try this

#!/usr/bin/awk -f 
BEGIN{FS="<|=";print "hostname, gc type, datetime, duration, location, intervalms, percent";"hostname"|getline _1}
/^<af/{split($2,_8," ");_2=_8[1];split($5,_8,"\"");_3=_8[2];split($6,_8,"\"");_6=_8[2]/1000}
/^<tenured/{++i}i==3{i=0;split($5,_8,"\"");_7=_8[2]} /^<time totalms/{split($3,_8,"\"");_4=_8[2]/1000}
/^<\/af>/{split(FILENAME,_8,"/");print _1,_2,_3,_4,_8[5],_6,_7} OFS=", "

Put the above code in a script say run and execute it like this

/user/ahamed/test $ ./run /opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch/log
hostname, gc type, datetime, duration, location, intervalms, percent
localhost, af, May 14 13:44:13 2011, 0.680097, EMQbatchprofile, 0.955624, 10
localhost, af, May 14 13:50:13 2011, 8.54001, EMQbatchprofile, 9.23012, 13

If your OS is solaris, please use /usr/xpg4/bin/awk

regards,
Ahamed

1 Like

Thanks for your quick response Ahmed, But I found some difference in between the actual result and expected result. Can you please look into this.
Actual result output

Expexcted result is

hostname, gc type, datetime, intervalms are correct but the duration, location and percent is not populating properly. (Location would be PJPETLbatch where the whole path is /opt/WebSphere61/AppServer/profiles/PJPETLbatchprofile/logs/PJPETLbatch/logfile.log)
And I need the output without any other log information except the above values.

can you paste the sample data you tried?

regards,
Ahamed

Hi Ahamed,

Please find the sample data from the lof file

---------- Post updated at 12:27 PM ---------- Previous update was at 12:17 PM ----------

Expected result from the above sample data

I am getting the correct output. For the location, a minor change may be required.

#!/usr/bin/awk -f  
BEGIN{FS="<|=";print "hostname, gc type, datetime, duration, location, intervalms, percent";"hostname"|getline _1} 
/^<af/{split($2,_8," ");_2=_8[1];split($5,_8,"\"");_3=_8[2];split($6,_8,"\"");_6=_8[2]/1000} 
/^<tenured/{++i}i==3{i=0;split($5,_8,"\"");_7=_8[2]} 
/^<time totalms/{split($3,_8,"\"");_4=_8[2]/1000} 
/^<\/af>/{t=split(FILENAME,_8,"/");print _1,_2,_3,_4,_8[t-1],_6,_7} OFS=", "
/user/ahamed/test $ ./run f
hostname, gc type, datetime, duration, location, intervalms, percent
localhost, af, Nov 15 22:55:30 2010, 0.314244, f, 0, 81
localhost, af, Dec 02 15:17:29 2010, 0.084461, f, 64967.2, 97
...

Here the location is different cause I didnt give the full path.

regards,
Ahamed

1 Like

Hi Ahamed,

I am able to get the location and values correctly right now. Thanks for your help.
But the unncessary data is also coming. Please refer the below output after the execution of the script.
Code

Output

How to rip this unnecessary data from the generated values
BTW my OS is Linux

I have no idea why it is behaving like this for you. Anyways try this

#!/usr/bin/awk -f  
BEGIN{FS="<|=";print "hostname, gc type, datetime, duration, location, intervalms, percent";"hostname"|getline _1} 
/^<af/{split($2,_8," ");_2=_8[1];split($5,_8,"\"");_3=_8[2];split($6,_8,"\"");_6=_8[2]/1000} 
/^<tenured/{++i}i==3{i=0;split($5,_8,"\"");_7=_8[2]} 
/^<time totalms/{split($3,_8,"\"");_4=_8[2]/1000} 
/^<\/af>/{t=split(FILENAME,_8,"/");a[j++]=_1 OFS _2 OFS _3 OFS _4 OFS _8[t-1] OFS _6 OFS _7} 
END{for(i=0;i<j;i++){print a} } OFS=", "

*Not tested!

regards,
Ahamed

1 Like

Thanks a lot Ahamed. This one worked for me almost 99%. Thanks again.