Convert CSV file to nested XML file using UNIX/PERL?

we have a CSV which i need to convert to XML using Perl or Unix shell scripting.

I was able to build this XML in oracle database. However, SQL/XML query is running for long time. Hence, I'm considering to write a Perl or shell script to generate this XML file. Basically need to build this XML hierarchically by avoiding the duplicate values. Please take a look into the XML file below.

Input data below:-

RLTP_ID,RLTP_NAME,PROD_ID,PROD_NAME,CUST_ID,CUST_NAME,ACC_ID,ACC_NAME,ACC_BALANCE,TXN_ID,TXN_AMT,ACC_TYPE,ACC_CODE,TXN_CODE
10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,YY,11
10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,YY,12
10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,ZZ,11
10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,ZZ,12
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,11
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,12
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,13
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,11
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,12
10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,13

XML File:-

<transactiondetails>
  <rltp_id>10</rltp_id>
  <rltp_name>Phil</rltp_name>
  <product>
    <prod_id>1</prod_id>
    <prod_name>Personal</prod_name>
    <customer>
      <cust_id>2</cust_id>
      <cust_name>Fixed</cust_name>
      <account>
        <acc_id>3</acc_id>
        <acc_name>Savings</acc_name>
        <acc_balance>3000</acc_balance>
        <acc_type>X</acc_type>
        <acc_code>ZZ</acc_code>
        <acc_code>YY</acc_code>
        <transaction>
          <txntrack>
            <txn_id>4</txn_id>
            <txn_amt>500</txn_amt>
            <txn_code>11</txn_code>
            <txn_code>12</txn_code>
          </txntrack>
        </transaction>
        <transaction>
          <txntrack>
            <txn_id>5</txn_id>
            <txn_amt>500</txn_amt>
            <txn_code/>
          </txntrack>
        </transaction>
        <transaction>
          <txntrack>
            <txn_id>6</txn_id>
            <txn_amt>500</txn_amt>
            <txn_code/>
          </txntrack>
        </transaction>
      </account>
      <account>
        <acc_id>7</acc_id>
        <acc_name>Savings</acc_name>
        <acc_balance>3000</acc_balance>
        <acc_type>X</acc_type>
        <acc_code>AA</acc_code>
        <acc_code>BB</acc_code>
        <transaction>
          <txntrack>
            <txn_id>8</txn_id>
            <txn_amt>500</txn_amt>
            <txn_code>11</txn_code>
            <txn_code>12</txn_code>
            <txn_code>13</txn_code>
          </txntrack>
        </transaction>
      </account>
    </customer>
  </product>
</transactiondetails>

Any idea would be great. Thank you.

---------- Post updated at 06:56 AM ---------- Previous update was at 04:14 AM ----------

im writing a shell script with nested for loop or nested while loop to generate this XML file. Please let me know if there is any other efficient way to process this CSV file. this CSV file contains total 20 million records. As you know looping through the 20 million records will take much time.

Appreciate your help.

If the data can be sorted hierarchically first (man sort) on the key starting with rltp_id, then prod_id, then cust_id, then tnx_id... then you can write script in a single pass to convert the data (trying to give hints rather than doing your homework).

20 million records... well... takes more space than most, but very doable...

You could use perl or awk to do this (or python or ruby or whatever)... You could use shell (bourne shell script variants), but might not be as easy especially since you want it to be relatively fast and also avoid duplicates.

I'm not trying to be mean, just trying to encourage your own work.

If you get too frustrated I might whip up a bourne (pure) example just as a challenge....

Thanks cjcox!

Yes data is sorted and written into a CSV file. Finally i have built a korn shell script to generate this XML file.

This is how i'm doing. Please correct me if i'm wrong.

Loop through each and every record
compare the current id columns(rltp_id,prod_id,cust_id,acc_id,txn_id) shell variable with previous variable id column value within the loop to avoid the duplicate XML elements. If there is any other simpler method in Korn shell. Please provide your ideas. We have lot of space and RAM in our server.
1TB
16GB RAM

However, my question is how much time would take to process 20 million records. if it completes in 20 minutes that is really great. :slight_smile:

Yes, we can build XML file by using Perl module XML::Writer,
here is the link for more details with script
XML - Convert CSV file into XML file in Perl