Extract XML block when value is matched (Shell script)

Hi everyone,

So i'm struggling with an xml (log file) where we get information about some devices, so the logfile is filled with multiple "blocks" like that.

Based on the <devId> i want to extract this part of the xml file. If possible I want it to have an script for this, cause we'll use this function quite a lot.

Already tried with grep, but i only get the line with the devId which isn't the result i want.
Tried to fiddle with xmllint but my knowledge isn't that advanced to play with it.

2019-06-16 20:20:11,695 | INFO  | CHEDULER_TOVSDC] | vsdc                             | 94 - org.apache.camel.camel-core - 2.17.3 | There is no task ack to send
2019-06-16 20:20:11,901 | INFO  | r[VSDCFE_ALARMS] | vsdc                             | 94 - org.apache.camel.camel-core - 2.17.3 | VSDCFE_ALARMS: Device alarm isssue message received contains {"time":"2019-06-16T18:20:10","deviceId":"number","alarms":[{"obis":"0;0;97;98;20;255","attributeId":"2","classId":"1","value":"0600000000"},{"obis":"0;0;97;98;21;255","attributeId":"2","classId":"1","value":"0600000000"},{"obis":"0;0;97;98;22;255","attributeId":"2","classId":"1","value":"0600800000"}]}
2019-06-16 20:20:11,914 | INFO  | r[VSDCFE_ALARMS] | AlarmTaskProcessor               | 237 - vsdc-alarm-manager - 2.0.83.43 | Alarm descriptor value is <= 0
2019-06-16 20:20:11,914 | INFO  | r[VSDCFE_ALARMS] | AlarmTaskProcessor               | 237 - vsdc-alarm-manager - 2.0.83.43 | Alarm descriptor value is <= 0
2019-06-16 20:20:11,914 | INFO  | r[VSDCFE_ALARMS] | vsdc                             | 94 - org.apache.camel.camel-core - 2.17.3 | Sending alarm device task request xml to task manager:  <?xml version="1.0" encoding="UTF-8"?>
<taskReq
    xmlns=""
       taskId="ALARM_number_1560709211907" taskType="DLMS" version="4" isActivation="false"
        execPriority="3">
    <targets>
        <devID>number</devID>
    </targets>
    <schedule>
        <start>2019-06-16T20:23:11.907+02:00</start>
        <stop>2019-06-16T21:20:11.908+02:00</stop>
    </schedule>
    <dlmsParams mode="unicast">
        <unicast timeout="45" maxTry="0" />
    </dlmsParams>
    <resultParams>
        <priority>urgent</priority>
        <mode>all</mode>
        <useCache>none</useCache>
    </resultParams>
    <transactions count="1">
            <transaction id="1">
                <dlms operation="SETM" association="1" >
                    <setm order="1" obis="0;0;97;98;22;255" attribute="2" classId="1">
                        <xdr>0600800000</xdr>
                    </setm>
                    <setm order="2" obis="0;0;97;98;2;255" attribute="2" classId="1">
                        <xdr>0600000000</xdr>
                       </setm>
                </dlms>
            </transaction>
    </transactions>
</taskReq>

2019-06-16 20:20:11,938 | INFO  | ActiveMQ Task-1  | FailoverTransport                | 77 - org.apache.activemq.activemq-osgi - 5.12.1 | Successfully connected to 
2019-06-16 20:20:11,950 | INFO  | CHEDULER_TOVSDC] | TaskProcessor                    | 246 - vsdc-task-manager - 2.0.83.43 | Task Scheduling validation
2019-06-16 20:20:11,952 | INFO  | CHEDULER_TOVSDC] | TaskProcessor                    | 246 - vsdc-task-manager - 2.0.83.43 | save task taskid ALARM_number_1560709211907
2019-06-16 20:20:11,956 | INFO  | r[VSDCFE_ALARMS] | DeviceIssueAlarmProcessor        | 237 - vsdc-alarm-manager - 2.0.83.43 | Starting device issue alarm process
2019-06-16 20:20:11,961 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | schedule creator for Task taskId: ALARM_number_1560709211907
2019-06-16 20:20:11,961 | INFO  | r[VSDCFE_ALARMS] | ldnFromDinAdapterRouteAlarm      | 94 - org.apache.camel.camel-core - 2.17.3 | message send from vsdc : <?xml version="1.0" encoding="UTF-8"?><alarms xmlns="">
    <deviceAlarms>
        <alarm devId="number" obis="0;0;97;98;20;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600000000</alarm>
        <alarm devId="number" obis="0;0;97;98;21;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600000000</alarm>
        <alarm devId="number" obis="0;0;97;98;22;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600800000</alarm>
    </deviceAlarms>
</alarms> 
2019-06-16 20:20:11,961 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | NON PERIODIC
2019-06-16 20:20:11,962 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | prepare Scheduling taskId [ALARM_number_1560709211907]
2019-06-16 20:20:11,962 | INFO  | CHEDULER_TOVSDC] | JobsDatePlannerServiceImpl       | 246 - vsdc-task-manager - 2.0.83.43 | First creator execution Date: 2019-06-16T20:22:11.907+02:00
2019-06-16 20:20:11,962 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | STOP DATE = 2019-06-16T21:20:11.908+02:00
2019-06-16 20:20:11,963 | INFO  | ActiveMQ Task-1  | FailoverTransport                | 77 - org.apache.activemq.activemq-osgi - 5.12.1 | Successfully connected to 
2019-06-16 20:20:11,974 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | schedule finalizer for Task taskId: ALARM_number_1560709211907
2019-06-16 20:20:11,975 | INFO  | CHEDULER_TOVSDC] | SchedulerProcessor               | 246 - vsdc-task-manager - 2.0.83.43 | NON PERIODIC
2019-06-16 20:20:11,975 | INFO  | r[VSDCFE_ALARMS] | vsdmc                             | 94 - org.apache.camel.camel-core -  2.17.3 | Device alarm message is sent to M2M containing <?xml  version="1.0" encoding="UTF-8"?><alarms  xmlns="">
    <deviceAlarms>
        <alarm devId="number" obis="0;0;97;98;20;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600000000</alarm>
        <alarm devId="number" obis="0;0;97;98;21;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600000000</alarm>
        <alarm devId="number" obis="0;0;97;98;22;255" classId="1" attribute="2" time="2019-06-16T20:20:10.000+02:00">0600800000</alarm>
    </deviceAlarms>
</alarms>

Can you post expected output?

Hi anbu,

So the codeblock that I posted is the result that i want to achieve. We have multiple blocks like that (we're close to 1m lines) and i just want to extract 1 block out of it.
I don't know if it would help to deliver your a bigger sample of code.

I think this is what you want.
It will extract all lines between successive "<devID>number</devID>" lines.

#!/bin.bash
write="N"
count=0
while read line
do
if [ "${line:0:7}" = "<devID>" ]
then
    write="Y"
    count=0
    #cat /dev/null >output.log
    #uncomment the above line to create a log of only the last block,
    #otherwise all blocks will be extracted
fi
if [ "$write" = "Y" ]
then
    echo "$line" >>output.log
    let count=$count + 1
fi
if [ "$write" = "Y" -a  $count -ne 1 ]
then
    if [ "$line:0:7}" = "<devID>" ]
    then
        write="N"
        count=0
    fi 
 fi
done

 

First of all thank you for your response jgt :slight_smile:
The answer that you gave me isn't actually the result that I want to achieve. (My explanation isn't clear also, so the fault lies with me)
I'll try to explain it a little better.

So we have log files in our VM's and we get information from some devices through Sim cards. Due to bad practices of our sub-contractors, we have to fetch some information through the logs.
But because we have more than 4k devices that sends logs daily we have to retrieve within those +400k line log files what kind of information it sent.

The snippet that I've put in my initial post is actually the final result that I want from the log files. So you could say that above and below my snippet there is a multitude of information and xml syntax that i don't really need.

Unfortunately there isn't really a specific line where I have to look for.

if [ "${line:0:7}" = "<devID>" ]

The only thing I noticed is that my block could also start from:

2019-06-16 20:20:11,901 | INFO  | r[VSDCFE_ALARMS] | vsdc                             | 94 - org.apache.camel.camel-core - 2.17.3 | VSDCFE_ALARMS: Device alarm isssue message received contains {"time":"2019-06-16T18:20:10","deviceId":"number","alarms":[{"obis":"0;0;97;98;20;255","attributeId":"2","classId":"1","value":"0600000000"},{"obis":"0;0;97;98;21;255","attributeId":"2","classId":"1","value":"0600000000"},{"obis":"0;0;97;98;22;255","attributeId":"2","classId":"1","value":"0600800000"}]}

and end with

</alarms>

so everything in between has to be included into the output. Note also that not all "blocks" aren't the same. Depending on the information it could contain 10 or more lines for each device.

If you want to print block which contains pattern devId="number"

awk -v RS="</alarms>" -v ORS="</alarms>" ' devId="number" ' logfile
1 Like