Phrase XML with Huge Data

pareshkp · February 25, 2016, 8:57pm

HI Guys,

I have Big XML file with Below Format :-

Input :-

<pokl>MKL=1,FN=1,GBNo=B10C</pokl>
<d>192</d>
<d>315</d>
<d>35</d>
<d>0,7,8</d>
<pokl>MKL=1,dFN=1,GBNo=B11C</pokl>
<d>162</d>
<d>315</d>
<d>35</d>
<d>0,5,6</d>
<pokl>MKL=1,dFN=1,GBNo=B12C</pokl>
<d>188</d>
<d>315</d>
<d>33</d>
<d>0,3,4</d>
<pokl>MKL=1,dFN=1,GBNo=B13C</pokl>
<d>192</d>
<d>315</d>
<d>35</d>
<d>0,1,2</d>

Output:-

B10C 192;315;35;0,7,8 
B11C 162;315;35;0,5,6
B12C 188;315;35;0,3,4
B13C 192;315;35;0,1,2

---------- Post updated at 08:57 PM ---------- Previous update was at 08:41 PM ----------

Got Ans.....

Thanks

MasWag · February 25, 2016, 10:04pm

With sed, like this

cat input.xml | tr -d "\n" | sed 's:</d><pokl>\|$:\n:g;' | sed 's/^.*GBNo=//;s:</pokl><d>: :;s:</d><d>:;:g;s:</d>::;'

danmero · February 27, 2016, 11:59am

awk -F'[<>]' '/pokl/{split($3,a,"[,|=]");printf "%s ",a[6];for(i=1;i<5;i++){getline;printf "%s%s",$3,(i==4)?RS:";"}}' file

Don_Cragun · February 27, 2016, 2:54pm

Hi pareshkp,
You might also want to try a slightly simpler awk script:

awk -F'[<=>]' '{printf("%s%s",$(NF-2),(NR%5)?(NR%5==1)?" ":";":ORS)}' Input

which with your sample input produces the output:

B10C 192;315;35;0,7,8
B11C 162;315;35;0,5,6
B12C 188;315;33;0,3,4
B13C 192;315;35;0,1,2

which differs from the output you said you wanted in two places:

there is no space character following the 8 at the end of the first line, and
there is a 33 in the 3rd line where you said you wanted a 35 .

The output shown here seems to match the sample input provided better than the output you said you wanted.

danmero · February 28, 2016, 8:14am

Hi Don Cragun, your solution is simpler and run 30% faster that my. Thanks.