Create a XML file for each row from the csv file

I have a CSV file that looks like this:

File,Name,birthdate,Amount
File1.xml,Name1,01.02.19,1000
File2.xml,Name2	01.02.20,1000
File3.xml,Name3,01.02.21,1000

I need it to turn it into an XML file for each row, My ultimate goal is for the File1.xml look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
  <properties>
    <entry key="cm:name">Name1</entry>
    <entry key="cm:birthdate">1899-12-30</entry>
    <entry key="cm:amount">$1,000.00</entry>	
  </properties>

Not this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<name>Name1</name>
	<birthdate>1899-12-30</birthdate>
	<amount>$1,000.00</amount>
</properties>

This will use for Content Bulk import and as metadata file in alfresco:
Preparing the Source Content . pmonks/alfresco-bulk-import Wiki . GitHub

something to start with: awk -F, -f lx.awk myFile.csv
where lx.awk is:

BEGIN {
  qq="\""
}
FNR==1{
  for(i=1;i<=NF;i++)
    tags=tolower($i)
  print "<?xml version=" qq "1.0" qq "encoding=" qq "UTF-8" qq "?>\n<!DOCTYPE properties SYSTEM " qq "http://java.sun.com/dtd/properties.dtd" qq ">"
}
{
  print "\t<properties>"
  for(i=2;i<=NF;i++)
    printf("\t\t<entry key=%scm=%s%s>%s</entry\n", qq, tags, qq, $i)
  print "\t</properties>"
}
1 Like

@vgersh99 thank you so much for the response and your time spent to figure out my problem, the output is looking good there have little modifications and need to be done, BTW this will use as a properties of a file or what we could metadata that will bulk import to alfresco using alfresco-bulk-import.

BEGIN {
  qq="\""
}
FNR==1{
  for(i=1;i<=NF;i++)
    tags=tolower($i)
  print "<?xml version=" qq "1.0" qq "encoding=" qq "UTF-8" qq "?>\n<!DOCTYPE properties SYSTEM " qq "http://java.sun.com/dtd/properties.dtd" qq ">"
}
{
  print "\t<properties>"
  for(i=2;i<=NF;i++)
    printf("\t\t<entry key=%scm:%s%s>%s</entry>\n", qq, tags, qq, $i)
  print "\t</properties>"
}

Here's the ouput:

<?xml version="1.0"encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
        <properties>
                <entry key="cm:name">Name</entry>
                <entry key="cm:birthdate">Birthdate</entry>
                <entry key="cm:amount">Amount</entry>
        </properties>
        <properties>
                <entry key="cm:name">Name1</entry>
                <entry key="cm:birthdate">01.02.19</entry>
                <entry key="cm:amount">1000</entry>
        </properties>
        <properties>
                <entry key="cm:name">Name2</entry>
                <entry key="cm:birthdate">01.02.20</entry>
                <entry key="cm:amount">1000</entry>
        </properties>
        <properties>
                <entry key="cm:name">Name3</entry>
                <entry key="cm:birthdate">01.02.21</entry>
                <entry key="cm:amount">1000</entry>
        </properties>

My expected output each row from CSV file which the first column with red color is the filename this will look like this:
The first column is the actual filename

File,Name,Birthdate,Amount
File1.xml,Name1,01.02.19,1000
File2.xml,Name2,01.02.20,1000
File3.xml,Name3,01.02.21,1000

File1.xml -- File2.xml and so on . . .

<?xml version="1.0"encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
        <properties>
                <entry key="cm:name">Name1</entry>
                <entry key="cm:birthdate">01.02.19</entry>
                <entry key="cm:amount">1000</entry>
        </properties>

And is possible if there is a 5th,6th,7th,8th and 9th column, remove the cm: so the output of each xml file like this:

<?xml version="1.0"encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
        <properties>
                <entry key="cm:name">Name3</entry>
                <entry key="cm:birthdate">01.02.21</entry>
                <entry key="cm:amount">1000</entry>
                <entry key="separator"></entry>
                <entry key="namespace"></entry>
                <entry key="parentAssociation"></entry>			
                <entry key="type"></entry>
                <entry key="aspects"></entry>					
        </properties>

CSV File:

File,Name,birthdate,Amount,separator,namespace,parentAssociation,type,aspects
File1.xml,Name1,01.02.19,1000,,,,
File2.xml,Name2,01.02.20,1000,,,,
File3.xml,Name3,01.02.21,1000,,,,

Hello lxdorney,

Could you please try following, not tested it though.

awk -v FS="," 'BEGIN{
  qq="\""
}
FNR==1{
  for(i=1;i<=NF;i++)
    tags=tolower($i)
  print "<?xml version=" qq "1.0" qq "encoding=" qq "UTF-8" qq "?>\n<!DOCTYPE properties SYSTEM " qq "http://java.sun.com/dtd/properties.dtd" qq ">"
}
{
  print "\t<properties>"
  for(i=2;i<=4;i++){
    printf("\t\t<entry key=%scm:%s%s>%s</entry>\n", qq, tags, qq, $i)
  }
  for(i=5;i<=NF;i++){
    printf("\t\t<entry key=%s%s%s>%s</entry>\n", qq, tags, qq, $i)
  }  
  print "\t</properties>"
}'    Input_file

NOTE: You haven't set FS="," which may cause issues since your Input_file is having delimiter as comma so I have added it as -v FS="," too in my code.

Thanks,
R. Singh

2 Likes

Didn't you "need it to turn it into an XML file for each row"? Try

awk -v FS="," '
FNR == 1        {split (tolower($0), tags)
                 next
                }
                {print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $1
                 print "<!DOCTYPE properties SYSTEM \"http://java.sun.com/dtd/properties.dtd\">" > $1

                 print "\t<properties>" > $1

                 CM = "cm:"
                 for (i=2; i<=NF; i++)  {printf ("\t\t<entry key=\"%s%s\">%s</entry>\n", CM, tags, $i) > $1
                                         if (i == 4) CM = ""
                                        }
                 print "\t</properties>" > $1
                }
'    file

and check the result files. Your first post was not quite consistent with "birthdate" and "birthday". Your sample file in post #3 has a comma separator too few.

2 Likes

Thank so much guys, I really appreciate the time and effort to solved the problem.