Pass some data from csv to xml file using shell/python

Zam_1234 · March 29, 2015, 3:03pm

Hello gurus,
I have a csv file with bunch of datas in each column. (see attached)

Now I have an .xml file in the structure of below:

?xml version="1.0" ?>
<component id="root" name="root">
    <component id="system" name="system">
        <param name="number_of_A" value="8"/>
        <param name="number_of_B" value="0"/>
        <param name="number_of_C" value="0"/>
        <param name="number_of_D" value="4"/> <!-- This number means how many L2 clusters in each cluster there can be multiple banks/ports -->
        <param name="number_of_E" value="0"/> <!-- This number means how many L3 clusters -->
        <param name="number_of_F" value="1"/>
        <param name="homogeneous_G" value="1"/><!--1 means homo -->
        <param name="homogeneous_H" value="1"/>
       <component id="system.core0" name="core0">
             <param name="RAS_size" value="32"/>                        
            <!-- general stats, defines simulation periods;require total, idle, and busy cycles for senity check  -->
            <!-- please note: if target architecture is X86, then all the instrucions refer to (fused) micro-ops -->
               <stat name="total_instructions" value="3765298576.0"/> <!-- CONX -->
               <stat name="N" value="3700000000.0"/> 
               <stat name="fp_instructions" value="64312000.0"/> 
            <stat name="branch_instructions" value="0"/>
            <stat name="branch_mispredictions" value="0"/>
               <stat name="load_instructions" value="1129589573.0"/> 
               <stat name="store_instructions" value="376529857.6"/> 
               <stat name="committed_instructions" value="3765298576.0"/>
               <stat name="committed_int_instructions" value="3700986576.0"/>
               <stat name="committed_fp_instructions" value="64312000.0"/> 
            <stat name="pipeline_duty_cycle" value="0.6"/>
            <stat name="L" value="100000"/>
            <stat name="idle_cycles" value="0"/>
            <stat name="busy_cycles"  value="100000"/>
            <stat name="ROB_reads" value="263886"/>
            <stat name="ROB_writes" value="263886"/>
            
            <stat name="rename_accesses" value="263886"/>
            <stat name="fp_rename_accesses" value="263886"/>
            
            <stat name="inst_window_reads" value="263886"/>
            <stat name="inst_window_writes" value="263886"/>
            <stat name="inst_window_wakeup_accesses" value="263886"/>
            <stat name="fp_inst_window_reads" value="263886"/>
            <stat name="fp_inst_window_writes" value="263886"/>
            <stat name="fp_inst_window_wakeup_accesses" value="263886"/>
            <!--  RF accesses -->
            <stat name="int_regfile_reads" value="1600000"/>
            <stat name="float_regfile_reads" value="40000"/>
            <stat name="int_regfile_writes" value="800000"/>
            <stat name="float_regfile_writes" value="20000"/>
            
            <stat name="function_calls" value="5"/>
            <stat name="context_switches" value="260343"/>
          
               <stat name="M" value="753059715.3"/> 
               <stat name="fpu_accesses" value="57880800.0"/> 
               <stat name="mul_accesses" value="57880800.0"/>
            <stat name="cdb_alu_accesses" value="1000000"/>
            <stat name="cdb_mul_accesses" value="0"/>
            <stat name="cdb_fpu_accesses" value="0"/>

My goal is to from the .csv, match the node name to this xml (truncate the "system." part if necessary) and then change the coresponding "value" in the xml and generate the new xml for each row in the csv.

I have tried to look into different posts in the net, but couldn't get them to work for my case.

Is the xml file indentation sensitive like python?

Thanks in advance.

Chubler_XL · March 29, 2015, 5:44pm

Can you show expected output for your sample data.

Zam_1234 · March 29, 2015, 10:03pm

Expected new xml:

?xml version="1.0" ?> <component id="root" name="root">     
<component id="system" name="system">        
 <param name="number_of_A" value="$colum1"/>  
  <param name="number_of_B" value="$colum2"/>        
 <param name="number_of_C" value="$colum3"/>       
  <param name="number_of_D" value="$colum4"/> <!-- This number means how many L2 clusters in each cluster there can be multiple banks/ports -->      
   <param name="number_of_E" value="$colum5"/> <!-- This number means how many L3 clusters -->     
    <param name="number_of_F" value="$colum6"/>        
 <param name="homogeneous_G" value="$colum7"/><!--1 means homo -->       
  <param name="homogeneous_H" value="1"/>
<component id="system.core0" name="core0">            
  <param name="RAS_size" value="32"/>                                     <!-- general stats, defines simulation periods;require total, idle, and busy cycles for senity check  -->             <!-- please note: if target architecture is X86, then all the instrucions refer to (fused) micro-ops -->               
 <stat name="total_instructions" value="3765298576.0"/> 
<stat name="N" value="3700000000.0"/>            
<stat name="fp_instructions" value="64312000.0"/>             
 <stat name="branch_instructions" value="0"/>             
<stat name="branch_mispredictions" value="0"/>               
 <stat name="load_instructions" value="1129589573.0"/>               
  <stat name="store_instructions" value="376529857.6"/>                
 <stat name="committed_instructions" value="3765298576.0"/>               
 <stat name="committed_int_instructions" value="3700986576.0"/> 
<stat name="committed_fp_instructions" value="64312000.0"/>            
  <stat name="pipeline_duty_cycle" value="0.6"/>           
  <stat name="L" value="100000"/>          
   <stat name="idle_cycles" value="0"/>             
<stat name="busy_cycles"  value="100000"/>          
   <stat name="ROB_reads" value="263886"/>            
 <stat name="ROB_writes" value="263886"/>                          
<stat name="rename_accesses" value="263886"/>             
<stat name="fp_rename_accesses" value="263886"/>                          
<stat name="inst_window_reads" value="263886"/>            
 <stat name="inst_window_writes" value="263886"/>             
<stat name="inst_window_wakeup_accesses" value="263886"/>            
 <stat name="fp_inst_window_reads" value="263886"/>             
<stat name="fp_inst_window_writes" value="263886"/>            
 <stat name="fp_inst_window_wakeup_accesses" value="263886"/>     <!--  RF accesses -->             
<stat name="int_regfile_reads" value="1600000"/>             
<stat name="float_regfile_reads" value="40000"/>          
   <stat name="int_regfile_writes" value="800000"/>            
 <stat name="float_regfile_writes" value="20000"/>                          
<stat name="function_calls" value="5"/>             
<stat name="context_switches" value="260343"/>                          
 <stat name="M" value="$colum8"/>                 
<stat name="fpu_accesses" value="57880800.0"/>                 
<stat name="mul_accesses" value="57880800.0"/>             
<stat name="cdb_alu_accesses" value="1000000"/>             
<stat name="cdb_mul_accesses" value="0"/>             
<stat name="cdb_fpu_accesses" value="0"/>

The "value" in red will be filled out from the .csv for the corresponding "name"

Chubler_XL · March 29, 2015, 10:45pm

I posted some code in this thread: Data formatting in CSV file to EXCEL, that should directly apply here.

Firstly create a template file like this. Edit it how you see fit - note the HEADER,ROW and FOOTER sections and
the special field tags like this %%FIELD#nn%% :

--HEADER--
?xml version="1.0" ?> <component id="root" name="root">
--ROW--
<component id="system" name="system">
        <param name="number_of_A" value="%%FIELD#01%%"/>
        <param name="number_of_B" value="%%FIELD#02%%"/>
        <param name="number_of_C" value="%%FIELD#03%%"/>
        <param name="number_of_D" value="%%FIELD#04%%"/> <!-- This number means how many L2 clusters in each cluster there can be multiple banks/ports -->
        <param name="number_of_E" value="%%FIELD#05%%"/> <!-- This number means how many L3 clusters -->
        <param name="number_of_F" value="%%FIELD#06%%"/>
        <param name="homogeneous_G" value="%%FIELD#07%%"/><!--1 means homo -->
        <param name="homogeneous_H" value="1"/>
       <component id="system.core0" name="core0">
             <param name="RAS_size" value="32"/>
            <!-- general stats, defines simulation periods;require total, idle, and busy cycles for senity check  -->
            <!-- please note: if target architecture is X86, then all the instrucions refer to (fused) micro-ops -->
               <stat name="total_instructions" value="3765298576.0"/> <!-- CONX -->
               <stat name="N" value="3700000000.0"/>
               <stat name="fp_instructions" value="64312000.0"/>
            <stat name="branch_instructions" value="0"/>
            <stat name="branch_mispredictions" value="0"/>
               <stat name="load_instructions" value="1129589573.0"/>
               <stat name="store_instructions" value="376529857.6"/>
               <stat name="committed_instructions" value="3765298576.0"/>
               <stat name="committed_int_instructions" value="3700986576.0"/>
               <stat name="committed_fp_instructions" value="64312000.0"/>
            <stat name="pipeline_duty_cycle" value="0.6"/>
            <stat name="L" value="100000"/>
            <stat name="idle_cycles" value="0"/>
            <stat name="busy_cycles"  value="100000"/>
            <stat name="ROB_reads" value="263886"/>
            <stat name="ROB_writes" value="263886"/>

            <stat name="rename_accesses" value="263886"/>
            <stat name="fp_rename_accesses" value="263886"/>

            <stat name="inst_window_reads" value="263886"/>
            <stat name="inst_window_writes" value="263886"/>
            <stat name="inst_window_wakeup_accesses" value="263886"/>
            <stat name="fp_inst_window_reads" value="263886"/>
            <stat name="fp_inst_window_writes" value="263886"/>
            <stat name="fp_inst_window_wakeup_accesses" value="263886"/>
            <!--  RF accesses -->
            <stat name="int_regfile_reads" value="1600000"/>
            <stat name="float_regfile_reads" value="40000"/>
            <stat name="int_regfile_writes" value="800000"/>
            <stat name="float_regfile_writes" value="20000"/>

            <stat name="function_calls" value="5"/>
            <stat name="context_switches" value="260343"/>

            <stat name="M" value="%%FIELD#08%%"/>
            <stat name="fpu_accesses" value="57880800.0"/>
            <stat name="mul_accesses" value="57880800.0"/>
            <stat name="cdb_alu_accesses" value="1000000"/>
            <stat name="cdb_mul_accesses" value="0"/>
            <stat name="cdb_fpu_accesses" value="0"/>
</component id="system" name="system">
--FOOTER--
</component id="root" name="root">

And here is your code:

awk -F, '
FNR==NR&&/^--/ {section++; next}
FNR==NR{block[section]=(block[section]?block[section]"\n":"") $0;next}
FNR==1{print block[1]}
{ out=block[2]
  for(i=1;i<=NF;i++) gsub(sprintf("%%%%FIELD#%02d%%%%",i),$i,out);
  print out
}   
END{print block[3]}' template sample_Core0.csv

Zam_1234 · March 30, 2015, 4:25am

Thanks @Chubler_XL

I was trying to do this in python. Started with a sample replacement of just 2 variables instead of all of them. It worked, but, I was hoping to print out each xml for all the rows in the csv file. My code is only printing out 1 xml file with the first row data from the csv. Here is the code:

 1 import csv
  2 input_file_name = "Core_0.csv"
  3 template_file = "Niagara1.xml"
  4 output_file = "{}_%s.xml" %input_file_name
  5 
  6 with open(template_file, "rb") as temp_file:
  7         template = temp_file.read()
  8 
  9 with open(input_file_name, "rU") as csv_f:
 10         my_reader = csv.DictReader(csv_f)
 11         for row in my_reader:
 12                 with open(output_file.format(row['system.total_cycles'],row['system.busy_cycles']), "wb") as current_out:
 13                         current_out.write(template.format(total_cycle=row["system.total_cycles"],busy_cycle=row["system.busy_cycles"]))

Why is this code not giving me xml for all the lines in the csv, rather gives just 1..

Chubler_XL · March 30, 2015, 4:27pm

The below python code is pretty much equivalent to the awk code from post #4, use the same template file layout as posted above:

import csv

input_file_name = "sample_Core0.csv"
template_file = "template"
output_file = "{}_%s.xml" % input_file_name

snum = 0
section=[""] * 4
current_out = open(output_file, "w")

with open(template_file, "r") as temp_file:
    for tline in temp_file:
        if (tline.startswith("--")):
           snum += 1
        else:
            section[snum] += tline

current_out.write(section[1])

with open(input_file_name, "rU") as csv_f:
    for row in csv.DictReader(csv_f):
        rowbuf = section[2]
        fnum = 0
        for col in row:
            fnum += 1
            rowbuf = rowbuf.replace("%%%%FIELD#%02d%%%%" % fnum, row[col])
        current_out.write(rowbuf)

current_out.write(section[3])
current_out.close