Using awk for converting xml to txt

Hi,

I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes

My xml script (Workorders.xml) is shown like below:

<?xml version="1.0" encoding="utf-8" ?>
<scbm-extract version="3.3">
<workOrderList>
<workOrder>
<workOrderCode>313194073</workOrderCode>
<branchCode>2021:1206</branchCode>
<demand></demand>
<demandLineItem></demandLineItem>
<priority></priority>
<completionDate>2016-07-10T08:00:00</completionDate>
</workOrder>

-----------------------------------------------------------------------------

I used the below awk command to do this (awk script copied from previous posts)

awk -f jobxml.awk WorkOrders.xml

jobxml.awk

BEGIN { FS="[<>]";      OFS=","         }
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {

        $0=""

        if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }

        for(N=1; N<=L; N++)     {       $N=A[T[N]];     delete A[T[N]]  }
        print
}

$2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}

Output comes like below:

workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
313194073,2021:1206,,,,2016-07-10T08:00:00

I want to show output like below. ie output values with double quotes

workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Please help me on this

Thanks,
Viswa

Hello Viswanatheee55,

Could you please try following and let me know if this helps you, I have only tested this with your provided Input_file.

awk 'BEGIN { FS="[<>]";      OFS=",";s1="\""         }
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print 
         }
 $2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.

workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"
 

Above has been tested with your provided Input_file, if you have more things to do then please mention into your post with complete details.

Thanks,
R. Singh

1 Like

Try using "," as the field separator

awk '
  $1=="/" idx {
    if(!Rec) Header=Header "," idx
    Values=Values "\",\"" val
  } 
  {
    idx=$1
    val=$2
  } 
  $1=="/workOrder" {
    if(!Rec++) {
      sub(/,/,x,Header)           # Remove excess field separator
      print Header
    }
    sub(/","/,x,Values)           # Remove excess field separator
    print "\"" Values "\""        # Print adding extra quotes around the Values string
    Values=""
  }
' RS=\< FS=\> file

Output:

workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"
1 Like

Hi,

When I use the above code it shows like below

awk -f jobxml.awk WorkOrders.xml
awk: jobxml.awk:1: awk 'BEGIN { FS="[<>]";      OFS=",";s1="\""         }
awk: jobxml.awk:1:     ^ invalid char ''' in expression

Thanks,
Viswa

---------- Post updated at 03:49 AM ---------- Previous update was at 03:43 AM ----------

Sorry I got it. That's my typo .

Thanks,
Viswa

Hello Viswa,

Solution posted in POST#2 by me worked for me. Could you please try following and let me know if this helps you.

awk -vs1="\"" 'BEGIN { FS="[<>]";      OFS=",";}
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print 
}
 $2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.

workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Thanks,
R. Singh

I got it Ravinder. Thank you so much for your help

Thanks,
Viswa

---------- Post updated at 06:04 AM ---------- Previous update was at 03:49 AM ----------

Hi Ravinder,

One more requirement

I dont want to list whole xml tags, Just need to list few xml tags.

I want to list the rows only for "workOrderCode,branchCode,demand,completionDate" columns

That means output should be like

workOrderCode,branchCode,demand,completionDate
"313194073","2021:1206","","2016-07-10T08:00:00"

Thanks,
Viswa

Hello Viswa,

Could you please try following and let me know if this helps.

awk -vs1="\"" 'BEGIN { FS="[<>]";      OFS=",";}
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print
}
$2 && $4 && ($2 == substr($4, 2)) && $2 !~ /demandLineItem/ && $2 !~ /priority/{
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.

workOrderCode,branchCode,demand,completionDate
"313194073","2021:1206","","2016-07-10T08:00:00"

Thanks,
R. Singh

Hi Ravinder,

Sorry for posting my requirement one by one.

awk script for above requirement works fine, Now added few more sub tags added in my xml file like below

<?xml version="1.0" encoding="utf-8" ?>
<scbm-extract version="3.3">
<workOrderList>
<workOrder>
<workOrderCode>313194073</workOrderCode>
<branchCode>2021:1206</branchCode>
<demand></demand>
<demandLineItem></demandLineItem>
<priority></priority>
<completionDate>2016-07-10T08:00:00</completionDate>
<workOrderOperation>
<operationCode>313193494::65:203629:100</operationCode> 
</workOrderOperation>
<workOrderOperation>
<operationCode>313193494::65:203630:100</operationCode>
</workOrderOperation>
<workOrderOperation>
<operationCode>313193494::65:203631:100</operationCode>
</workOrderOperation>
</workOrder>

I used your same awk script but getting output like this.

workOrderCode,branchCode,demand,completionDate,operationCode
"313194073","2021:1206","","2016-07-10T08:00:00","313193494::65:203629:100"
"","","","","313193494::65:203630:100"
"","","","","313193494::65:203631:100"
"","","","",""

That means for operationCode "313193494::65:203630:100" and "313193494::65:203631:100" showing null value instead of showing workOrderCode as "313194073"

Thanks,
Viswa

You posted requirements and got working code for those requirements. If you have a new set of requirements, try modifying the script you were given to meet those new requirements. (Do NOT assume that this forum is here to act as your unpaid programming staff.) If you can't get the code to work for your new requirements, start a new thread and provide new sample input, new sample output, COMPLETE requirements, show us the code you have attempted to write to meet your new requirements, and show us the output your code produces (including any diagnostic messages).