Grep some values from XML file

Dear community,
I have a big XML log file containing several rows splitted by tag: <ActivityLogRecord> and </ActivityLogRecord>. An example below.

What I need is read the file and extract some value from each tags and put them into one line (each line for every <ActivityLogRecord> tag).

So in the example the output should be:


The problem where I'm scratching my head is that sometimes this field is missing: <WhoCalledOn>true</WhoCalledOn> and this could be reported twice, something like:

<Identifier Type='TelephoneNumber' Value='324234231443'/>
<Identifier Type='TelephoneNumber' Value='324234231443'/>

This means the output I need should be:


In other words, if the tag WhoCalledOn is missing, I should report in output something like NULL, otherwise put the value like true o false. Same thing for the TelephoneNumber tag. If reported only one, the second column should be set as NULL, otherwise report the value.

Well, I know, this is not properly simple, that's why I'm asking to the expert! :slight_smile:

Thank you

   <Common COSID='88' DomainID='BL' EndTimeStamp='2016-03-23T10:00:00.10+01:00' MainAction='Modify Subscriber' ReferenceNumber='' ServerID='omu234234' ServiceName='SPM' StartTimeStamp='2016-03-23T09:59:59.850+01:00' UserID='234234234234' UserTerminal='vxv_app_user'>
      <MainActionResult Description='ERROR: 0 - Action completed successfully' Status='Success'/>
      <Provisioning xmlns:xsi=''>
                  <Identifier Value='324234231443'/>
                  <SubscriberDomainName>Default domain</SubscriberDomainName>
      <Provisioning xmlns:xsi=''>
                  <Identifier Type='TelephoneNumber' Value='324234231443'/>

Not sure I understand. Looking at your sample, I can't see WHAT should be reported twice. And, is "COSID=" a constant present in all records that can be searched for, or part of the <common> tag?

Please show us your attempts so far.

Well, first of all thanks for reply, what I need is the 3 red value in my previous example.
As I wrote COSID and TelephoneNumber is always present
In some case <WhoCalledOn> is not present so in the ouput should be reported NULL. In other cases TelephoneNumber rows is reported twice, so output I need is:


So, in the first case the tag <WhoCalledOn> is present and the value is true.
In the second case the tag <WhoCalledOn> is not present so should be reported as null, plus, TelephoneNumber is reported twice. Something like:

<Identifier Type='TelephoneNumber' Value='324234231443'/>
<Identifier Type='TelephoneNumber' Value='324234231443'/>

Hope is clear,
thank you,

Try (with an extrapolated sample file)

awk '
/<Activity.*>/  {WHO  = ",null"
                 TEL2 = ",null"
                 TCNT = 0
/COSID=/        {printf "%s", $2
/<WhoCalledOn>/ {gsub (/ *<[^>]*>/, _)
                 WHO = "," $0
/TelephoneNum/  {sub (/.>$/, _, $3)
                 printf "%s,%s", WHO, $3
                 WHO = ""
                 if (TCNT++) TEL2 = ""
/<\/Activit.*>/ {printf "%s%s", TEL2, RS
' file

Than you RudiC, you're the king :slight_smile:
I have only to fix something on output since I have:


When I have two TelephoneNumber, I got double quotes. Most probably because the right structure in the XML file is:

                  <Identifier Type='TelephoneNumber' Value='4354353455'/>
      <Provisioning xmlns:xsi=''>
                  <Identifier Type='TelephoneNumber' Value='4354353455'/>

Other strange output is:


And I don't know why I got it.

Anyway, you save my day, thank you! :b:

Without seeing the input that resulted in the output posted I can't comment.