Hi all,
I am trying to generate an XML file from a flatfile in ksh/bash (could also use perl at a pinch, but out of my depth there!).
I have found several good solutions on this very forum for cases where the header line in the file forms the XML tags, however my flatfile is as follows:
Object,Type
Table1,Tables
Table2,Tables
Table3,Tables
View1,Views
View2,Views
Proc1,Procs
Proc2,Procs
And I want to create the following:
<Whatever>
<Tables>
Table1
Table2
Table3
</Tables>
<Views>
View1
View2
</Views>
<Procs>
Proc1
Proc2
</Procs>
</Whatever>
So I essentially want the data to be segregated by one of the data columns in the flatfile, rather than just a more straightforward 'header becomes a tag' scenario.
All pointers much appreciated!
*Edit*
Although the data should always be in sequence of the different types, I would be interested to see if it could handle:
Object,Type
Table1,Tables
Proc1,Procs
View1,Views
Table2,Tables
Table3,Tables
View2,Views
Proc2,Procs
Thanks,
Ian
Something like this:
awk -F\, 'ant!=$NF{if(ant!=""){print "</"ant">"};print "<"$NF">\n "$1;ant=$NF;next}{print " "$1}END{print "</"ant">"}' infile
1 Like
hey, thanks for the quick response! It works perfectly for the ordered dataset.
Would it take much to adapt it to do the following:
- Ignore a header line (i..e line 1)
- Work with a non-ordered flatfile as per my edit above.
Even if not, this is great - would have taken me hours to come up with!
$ cat flat2xml.awk
BEGIN { FS="," }
NR==1 { next }
{ D[$2,++T[$2]]=$1 }
END {
print "<whatever>";
for(X in T)
{
print "\t<" X ">";
for(N=1; N<=T[X]; N++) print "\t\t" D[X,N];
print "\t</" X ">";
}
print "</whatever>";
}
$ awk -f flat2xml.awk data
<whatever>
<Procs>
Proc1
Proc2
</Procs>
<Views>
View1
View2
</Views>
<Tables>
Table1
Table2
Table3
</Tables>
</whatever>
$
It doesn't need to handle more than 2 columns, does it?
1 Like
No it doesn't, although they have decided the flatfile will have the two columns in the opposite order. Any suggestion on the tweak I need to make to the first script to make keep the output the same, but the input is Col2,Col1 instead of Col1,Col2?