XML parsing

i have xml output in below format...

 
<AlertsResponse>
    <Alert id="11216" name="fgdfg">
        <AlertActionLog timestamp="1356521629778" user="admin" detail="Recovery Alert"/>
    </Alert>
    <Alert id="11215" name="gdfg
        <AlertActionLog timestamp="1356430119840" user="" detail="TRAP sent successfully."/>
        <AlertActionLog timestamp="1356430247023" user="admin" detail="Recovery Alert"/>
    </Alert>
</AlertsResponse>

i want to parse the above to get below output.... how do i do it? can anyone please explain?

 
id=11216
name=APPfixed=true
date=2012-12-25,06:38:43.6
 
id=11215
name=APPfixed=true
date=2012-12-25,06:33:43.6

did you try anything?
What lang you are planning to use?

In perl there is XML::Simple module available, just read the file using module, use Data::Dumper to see how data is formatted in perl hash. Once you know where to look for required data just pick the proper fields from hash and get the result.

is it possible to do that in shell... perl i dont know whether my machine has it or not... can you just guide me how to start on it... ??

i have worked on java xml parsing but xml parsing in shell is new to me.. although i have knowledge on basic awk/grep string manipulation in shell scripts... working on them since many months now.

awk -F'[=|"|<|>|,]' '{for(i=1;i<=NF;i++){
 if($i=="Alert id") {
  if(id!="") print id,nm,fx,dt;
  id=($i=="Alert id")?$(i+2):id; }
  nm=($i==" name")?$(i+2):nm;
  fx=($i==" fixed")?$(i+2):fx;
  dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt;
 }
}END{
 print id,nm,fx,dt;
}' xmlfile
1 Like

thanks a lot.. the output is coming properly..

12530 APP-MS-lib_license_common-150016-licenseHardLimitReachedBlueWaveCTIAPI-S false  2013-01-24 08:09:34.7
12529 APP-MS-lib_license_common-150040-licenseSchemaTampered-S_R true  2013-01-24 08:09:
12528 APP-MS-lib_license_common-150012-enterpriseLicenseInstallFailed-S_R true  2013-01-24 08:09:08.0
12527 APP-MS-lib_security-124005-LoginLicense-S false  2013-01-24 08:00:47.2

can you please explain how the command works??

awk -F'[=|"|<|>|,]' '{for(i=1;i<=NF;i++){ if($i=="Alert id") {  if(id!="") print id,nm,fx,dt;  id=($i=="Alert id")?$(i+2):id; }  nm=($i==" name")?$(i+2):nm;  fx=($i==" fixed")?$(i+2):fx;  dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt; }}END{ print id,nm,fx,dt;}' xmlfile

like its in for loop and comparing for tags with $i.. if it matches how is it extracting it?

also i need one more help.. if i want the output to come in below format, how can i change the command?

ID=12530 
NAME=APP-MS-lib_license_common-150016-licenseHardLimitReachedBlueWaveCTIAPI-S 
FIXED=false  
DATE=2013-01-24 08:09:34.7

To change the output format:

awk -F'[=|"|<|>|,]' '{for(i=1;i<=NF;i++){
 if($i=="Alert id") {
  if(id!="") printf "ID=%d\nNAME=%s\nFIXED=%s\nDATE=%s\n", id,nm,fx,dt;
  id=($i=="Alert id")?$(i+2):id; }
  nm=($i==" name")?$(i+2):nm;
  fx=($i==" fixed")?$(i+2):fx;
  dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt;
 }
}END{
 printf "ID=%d\nNAME=%s\nFIXED=%s\nDATE=%s\n", id,nm,fx,dt;
}' xmlfile

-F'[=|"|<|>|,] - Setting = " < > , as field separators.

So if $i matches the tag, we are extracting based on the position of tag value.

1 Like

so for

Alert id="11216" 
$(i+2):id;

means, put second position from i to id variable is it??

You are right. Depending upon other tag value positions I have adjusted it. I hope you understood.

1 Like

many thanks Bipin.. :slight_smile: i understood.

as of now i am using this below code

awk -F'[=|"|<|>|,]' '{for(i=1;i<=NF;i++){
 if($i=="Alert id") {
  if(id!="")
        if(dt!="" && fx == "false"){ printf "NAME=%s\nFIXED=%s\nDATE=%s\nAlDefID=%d\nResId=%d\nReason=%s\n\n", nm,fx,dt,alDFid,rId,reson; }
  id=($i=="Alert id")?$(i+2):id; }
  nm=($i==" name")?$(i+2):nm;
  fx=($i==" fixed")?$(i+2):fx;
  dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt;
  alDFid=($i==" alertDefinitionId")?$(i+2):alDFid;
  rId=($i==" resourceId")?$(i+2):rId;
reson=($i==" reason")?$(i+2):reson;
 }
}END{
 if(dt!="" && fx == "false"){
 printf "NAME=%s\nFIXED=%s\nDATE=%s\nAlDefID=%d\nResId=%d\nReason=%s\n", nm,fx,dt,alDFid,rId,reson;
}
}' alerts.xml >alertsExtracted.txt

but it is failing to display complete string for below inputs for REASON field

 <Alert id="10615" name="Turret-IQ/MAX-101001-NotAvailable-P" alertDefinitionId="17473" resourceId="11720" ctime="1362464100000" fixed="false" reason="If Availability != 100.0% (actual value = 0.0%)">
 
 
</Alert>
    <Alert id="10602" name="APP-MS-lib_license_common-150040-licenseSchemaTampered-S" alertDefinitionId="16315" resourceId="11424" ctime="1362398245776" fixed="false" reason="If Event/Log Level(ANY) and matching substring "licenseSchemaTampered"     Log: : 3 days, 1:57:16.57, 1.3.6.1.4.1.1453.4.9.1.3.0.1, License Entity Tampered., SET License Entity Tampered : 2003,, licenseSchemaTampered, 2013-03-04,06:57:25.7,--5:0, HIGH, , License Entity Tampered  due to change of value(s     AND Event/Log Level(ANY) and matching substring "HIGH"     Log: : 3 days, 1:57:16.57, 1.3.6.1.4.1.1453.4.9.1.3.0.1, License Entity Tampered., SET License Entity Tampered : 2003,, licenseSchemaTampered, 2013-03-04,06:57:25.7,--5:0, HIGH, , License Entity Tampered  due to change of value(s     AND Event/Log Level(ANY) and matching substring "lib_license_common"     Log: : 3 days, 1:57:16.57, 1.3.6.1.4.1.1453.4.9.1.3.0.1, License Entity Tampered., SET License Entity Tampered : 2003,, licenseSchemaTampered, 2013-03-04,06:57:25.7,--5:0, HIGH, 10.19.123.105, License Entity Tampered  due to change of value(s"/>

the output is coming as

NAME=APP-MS-lib_license_common-150040-licenseSchemaTampered-S
FIXED=false
DATE= 2013-03-04 06:57:25.7
Description= License Entity Tampered. 
PRIORITY=HIGH
RESOURCE NAME=-pri-105
Reason=If Event/Log Level(ANY) and matching substring "licenseSchemaTampered" Log: : 3 days
NAME=CCM-Platform-106006-LinuxServerLostConnection-P
FIXED=false
DATE= 2013-03-04 06:47:25.7
Description= Linux Server lost connection (missed one polling)  Resolution hint: Check network connection 
PRIORITY=MED
RESOURCE NAME=204
Reason=If Availability < 100.0% (actual value
 

i am interested in the modification of only Reason field to provide complete output as below

NAME=APP-MS-lib_license_common-150040-licenseSchemaTampered-S
FIXED=false
DATE= 2013-03-04 06:57:25.7
Description= License Entity Tampered. PRIORITY=HIGH
RESOURCE NAME=-pri-105
Reason=If Event/Log Level(ANY) and matching substring "licenseSchemaTampered" Log: : 3 days
NAME=CCM-Platform-106006-LinuxServerLostConnection-P
FIXED=false
DATE= 2013-03-04 06:47:25.7
Description= Linux Server lost connection (missed one polling)  Resolution hint: Check network connection 
PRIORITY=MED
RESOURCE NAME=204
Reason=If Availability != 100.0% (actual value = 0.0%)

is it possible?

Just concatenate next field:

reson=($i==" reason")?$(i+2)$(i+3):reson;

Also change field separators to:

awk -F'[="<>,]'

Hi Bipinajith, i dint understand the part where the code is converting date

dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt;

is it converting epoc time to normal time?

for few of my xml its not considering date and hench there is null check on date it is not displaying that entry in the output.

for the prvious querry i implemented below part as quick workaround

        if( fx == "false"){ printf "NAME=%s\nFIXED=%s\nDATE and TIME=%s\nResId=%d\nAlDefID=%d\nReason=%s%s\n\n", nm,fx,dt,rId,alDFid,reson,reson2; }
  id=($i=="Alert id")?$(i+2):id; }
  nm=($i==" name")?$(i+2):nm;
  fx=($i==" fixed")?$(i+2):fx;
  dt=($i~/^ [0-9]+-/)?$i" "$(i+1):dt;
  alDFid=($i==" alertDefinitionId")?$(i+2):alDFid;
  rId=($i==" resourceId")?$(i+2):rId;
reson=($i==" reason")?$(i+2):reson;
reson2=($i==" reason")?$(i+3):reson2;
 }
}END{
 if(fx == "false"){
 printf "NAME=%s\nFIXED=%s\nDATE AND TIME=%s\nResId=%d\nAlDefID=%d\nReason=%s%s\n", nm,fx,dt,rId,alDFid,reson,reson2;

Code is not doing any conversion.

Regexp /^ [0-9]+-/ means that pattern should start with a space ^ followed by one or more occurrence of any number [0-9]+ followed by hyphen -

If this regexp is not working for fetching date. Then you have to look at your input XML and define a regexp that satisfies all possibilities to fetch date.