XML to CSV

I want to pharse below Xml Using Shell Scripting .

Thanks in Advance

<md>
<neid>
<neun>1523</neun>
<nedn>XXX1212</nedn>
<nesw>fffff12515</nesw>
</neid>
<mi>
<mts>20141128001500</mts>
<gp>550</gp>
<mt>pmct1</mt>
<mt>pmNo2</mt>
<mt>pmNo3S</mt>
<mv>
<moid>Ma=1,Rn=1,Ul=311C</moid>
<r>11</r>
<r>21</r>
<r>0</r>
</mv>
<mv>
<moid>Ma=1,Rn=1,Ul=311B</moid>
<r>4</r>
<r>11</r>
<r>0</r>
</mv>
<mv>
<moid>Ma=1,Rn=1,Ul=347C</moid>
<r>11</r>
<r>43</r>
<r>1</r>
</mv>
</mi>
</md>



Output:-
UC          pmct1          pmNo2          pmNo3S
311C	     11	        21		   0
311B	     4	        11                  0	
347C	     11	        43		   1	

---------- Post updated at 03:41 PM ---------- Previous update was at 03:39 PM ----------

What have you tried?

I have tried Below but its not working:confused:

awk -F'[\"\>\<]' -v OFS=',' 'BEGIN{print "''" } /pmct1/{a=$3} /pmNo2/{b=$3}/pmNo3S/{print a,b,$3}' $XXX > $test

What is $XXX ?

This works...

awk -fx.awk <input-xml-file>

where x.awk contains....

BEGIN {printf("%-10s%-10s%-10s%-10s\n","UC","pmct1","pmNo2","pmNo3S")}
/<moid>/ {
        split($0,A,"="); split(A[4],B,"<"); UC= B[1]
        getline; split($0,A,"<");split(A[2],B,">");pmct1=B[2]
        getline; split($0,A,"<");split(A[2],B,">");pmNo2=B[2]
        getline; split($0,A,"<");split(A[2],B,">");pmNo3S=B[2]
        printf("%-10s%-10s%-10s%-10s\n",UC,pmct1,pmNo2,pmNo3S)
        }

I think $XXX is the forbidden variable of which we do not speak (kidding)

1 Like

Hello pareshkp,

Following may hepl you also in same.

awk 'BEGIN{OFS="\t";print "UC" OFS "pmct1" OFS "pmNo2" OFS "pmNo3S"} /\<moid\>/ {match($0,/Ul.*\</);A=substr($0,RSTART+3,RLENGTH-5);X[++o]=A;getline;{while($0 !~ /<\/mv>/){match($0,/[0-9]+/);X[o]=X[o] OFS substr($0,RSTART,RLENGTH);getline}}} END{for(i in X){print X}}' OFS="\t"  Input_file

Output will be as follows.

UC      pmct1   pmNo2   pmNo3S
311C    11      21      0
311B    4       11      0
347C    11      43      1

In this solution I have hardcoded 1st line of UC pmct1 pmNo2 pmNo3S As I haven't seen any where about UC in code. Kindly let us know if this helps.

EDIT: Adding a non-single liner form of same solution, you can make use it as a script as well.

awk '
BEGIN{OFS="\t";print "UC" OFS "pmct1" OFS "pmNo2" OFS "pmNo3S"}
/\<moid\>/ {
match($0,/Ul.*\</);
A=substr($0,RSTART+3,RLENGTH-5);
X[++o]=A;
getline;{
                while($0 !~ /<\/mv>/){
                                        match($0,/[0-9]+/);
                                        X[o]=X[o] OFS substr($0,RSTART,RLENGTH);getline
                                     }
        }
           }
END{for(i in X){print X}
}' OFS="\t" xml_test ##### xml_test is input file

Thanks,
R. Singh

1 Like

Also try :

awk -F'[><=]' '/^<moid/{k=1; if(p)print p; p=$6}k && /^<r/{p=p OFS $3}END{print p}' infile

-- edit--

With header

awk -F'[><=]' 'FNR==1{print "UC","pmct1","pmNo2","pmNo3S"}/^<moid/{k=1; if(p)print p; p=$6}k && /^<r/{p=p OFS $3}END{print p}' OFS="\t" infile
1 Like