Grep some values from XML file

Dear community,
I have a big XML log file containing several rows splitted by tag: <ActivityLogRecord> and </ActivityLogRecord>. An example below.

What I need is read the file and extract some value from each tags and put them into one line (each line for every <ActivityLogRecord> tag).

So in the example the output should be:

COSID='88',true,Value='393290439266'

The problem where I'm scratching my head is that sometimes this field is missing: <WhoCalledOn>true</WhoCalledOn> and this could be reported twice, something like:

<Identifier Type='TelephoneNumber' Value='324234231443'/>
<Identifier Type='TelephoneNumber' Value='324234231443'/>

This means the output I need should be:

COSID='88',true,Value='393290439266',null
COSID='88',null,Value='393290439266',Value='393290439266'

In other words, if the tag WhoCalledOn is missing, I should report in output something like NULL, otherwise put the value like true o false. Same thing for the TelephoneNumber tag. If reported only one, the second column should be set as NULL, otherwise report the value.

Well, I know, this is not properly simple, that's why I'm asking to the expert! :slight_smile:

Thank you
Lucas

<ActivityLogRecord>
   <Common COSID='88' DomainID='BL' EndTimeStamp='2016-03-23T10:00:00.10+01:00' MainAction='Modify Subscriber' ReferenceNumber='192.9.224.15-1455979927516' ServerID='omu234234' ServiceName='SPM' StartTimeStamp='2016-03-23T09:59:59.850+01:00' UserID='234234234234' UserTerminal='vxv_app_user'>
      <MainActionResult Description='ERROR: 0 - Action completed successfully' Status='Success'/>
   </Common>
   <ServiceSpecific>
      <Provisioning xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
         <Request>
            <Header>
               <Command>Modify</Command>
               <EntityIdentifiers>
                  <Identifier Value='324234231443'/>
               </EntityIdentifiers>
                <Data>
                   <Subscriber>
                      <WhoCalledOn>true</WhoCalledOn>
                   </Subscriber>
                </Data>
               <HostReference>Automatic</HostReference>
            </Header>
            <Data>
               <Subscriber>
                  <SubscriberCosName>8</SubscriberCosName>
                  <SubscriberDomainName>Default domain</SubscriberDomainName>
               </Subscriber>
            </Data>
         </Request>
      </Provisioning>
      <Provisioning xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
         <Response>
            <Header>
               <EntityIdentifiers>
                  <Identifier Type='TelephoneNumber' Value='324234231443'/>
               </EntityIdentifiers>
               <EntityName>Subscriber</EntityName>
               <HostReference>Automatic</HostReference>
               <ResponseStatus>Success</ResponseStatus>
            </Header>
         </Response>
      </Provisioning>
   </ServiceSpecific>
</ActivityLogRecord>

Not sure I understand. Looking at your sample, I can't see WHAT should be reported twice. And, is "COSID=" a constant present in all records that can be searched for, or part of the <common> tag?

Please show us your attempts so far.

Well, first of all thanks for reply, what I need is the 3 red value in my previous example.
As I wrote COSID and TelephoneNumber is always present
In some case <WhoCalledOn> is not present so in the ouput should be reported NULL. In other cases TelephoneNumber rows is reported twice, so output I need is:

COSID='88',true,Value='393290439266',null
COSID='88',null,Value='393290439266',Value='393290439266'

So, in the first case the tag <WhoCalledOn> is present and the value is true.
In the second case the tag <WhoCalledOn> is not present so should be reported as null, plus, TelephoneNumber is reported twice. Something like:

<EntityIdentifiers>                   
<Identifier Type='TelephoneNumber' Value='324234231443'/>
<Identifier Type='TelephoneNumber' Value='324234231443'/>
</EntityIdentifiers>

Hope is clear,
thank you,
Lucas

Try (with an extrapolated sample file)

awk '
/<Activity.*>/  {WHO  = ",null"
                 TEL2 = ",null"
                 TCNT = 0
                }
/COSID=/        {printf "%s", $2
                }
/<WhoCalledOn>/ {gsub (/ *<[^>]*>/, _)
                 WHO = "," $0
                }
/TelephoneNum/  {sub (/.>$/, _, $3)
                 printf "%s,%s", WHO, $3
                 WHO = ""
                 if (TCNT++) TEL2 = ""
                }
/<\/Activit.*>/ {printf "%s%s", TEL2, RS
                }
' file
COSID='88',true,Value='324234231443',null
COSID='76',null,Value='324234231443',null
COSID='69',true,Value='324234231443',Value='3242XXX31443'

Than you RudiC, you're the king :slight_smile:
I have only to fix something on output since I have:

,null,,true,
,null,,true,
,null,,true,
COSID='21',null,Value='4354353455',,Value='4354353455'
COSID='21',null,Value='6786786787',,Value='6786786787'
COSID='21',null,Value='345435345345',,Value=345435345345'

When I have two TelephoneNumber, I got double quotes. Most probably because the right structure in the XML file is:

         <Request>
            <Header>
               <Command>Create</Command>
               <EntityName>Subscriber</EntityName>
               <EntityIdentifiers>
                  <Identifier Type='TelephoneNumber' Value='4354353455'/>
               </EntityIdentifiers>
            </Header>
            <Data>
               <Subscriber>
                  <SubscriberCosName>21</SubscriberCosName>
                  <GlobalLanguageId>it-IT</GlobalLanguageId>
                  <TuiPassword>****</TuiPassword>
                  <GenCustom6>15</GenCustom6>
                  <TelephoneNumber>4354353455</TelephoneNumber>
               </Subscriber>
            </Data>
         </Request>
      </Provisioning>
      <Provisioning xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
         <Response>
            <Header>
               <EntityIdentifiers>
                  <Identifier Type='TelephoneNumber' Value='4354353455'/>
               </EntityIdentifiers>
               <EntityName>Subscriber</EntityName>
               <ResponseStatus>Success</ResponseStatus>

Other strange output is:

,null,,true,
,null,,true,
,null,,true,

And I don't know why I got it.

Anyway, you save my day, thank you! :b:

Without seeing the input that resulted in the output posted I can't comment.