Writing xml from excel sheet .xls using perl script

Hi all.

I am working on the below requirement of generating .xml file from .xls file which i have , can someone please help me or in writing the perl script for the same:

The xls file format is as below which has two columns and number of rows are not fixed:
Fixlet Name ID
MS14-064: Vulnerabilities in Windows OLE Could Allow Remote Code Execution - Windows Server 2008 SP2 - KB3006226 (x64) 1406401
MS14-052: Vulnerabilities in Windows OLE Could Allow Remote Code Execution - Windows Server 2008 SP2 - KB3006122 (x64) 45678

The required .xml format is as below :

<?xml version="1.0" encoding="UTF-8"?>
<BES xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BES.xsd">
<Baseline>
<Title>Test-bigfix</Title>
<Description><![CDATA[<enter a description of the baseline here> ]]></Description>
<Relevance>true</Relevance>
<Category></Category>
<Source>Internal</Source>
<SourceID></SourceID>
<SourceReleaseDate>2014-11-12</SourceReleaseDate>
<SourceSeverity></SourceSeverity>
<CVENames></CVENames>
<SANSID></SANSID>
<MIMEField>
<Name>x-fixlet-modification-time</Name>
<Value>Wed, 12 Nov 2014 14:32:18 +0000</Value>
</MIMEField>
<Domain>BESC</Domain>
<BaselineComponentCollection>
<BaselineComponentGroup>
<BaselineComponent Name="MS14-064: Vulnerabilities in Windows OLE Could Allow Remote Code Execution - Windows Server 2008 SP2 - KB3006226 (x64)" IncludeInRelevance="true" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceID="1406401" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS14-052: Vulnerabilities in Windows OLE Could Allow Remote Code Execution - Windows Server 2008 SP2 - KB3006122 (x64)" IncludeInRelevance="true" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceID="45678" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
</BaselineComponentGroup>
</BaselineComponentCollection>
</Baseline>
</BES>

basically from excel values under fixlet name and id columns should be picked and then added into xml in above format.
Please suggest as i am very much new to perl scripting but want to learn it.

If the perl libs for Excel are too daunting, you might convert the sheet to CSV or tab separated text (there are several canned apps for that) and convert that using your favorite text tool.

https://www.google.com/\#q=xls\+csv\+OR\+txt\+OR\+text

@DGPickett : thanks for the reply and suggestion but actually i a looking for automation as this task will be repeated one in our enviornment and the input file which in .xls will be standard one and hence can not change it :frowning:

could you please suggest any other way by which i can automate this task of converting .xls file to predefined xml format through some scripting but without any tool...

No problem, turn each day's xls to text in a pipe stream and reformat the text to xml in a batch script; nothing interactive in this.

Here is a tool I've cobbled together with Spreadsheet::ParseExcel:

#!/usr/bin/perl

#!/usr/bin/perl -w
# For each tab (worksheet) in a file (workbook),
# spit out columns separated by ",",
# and rows separated by c/r.

use Spreadsheet::ParseExcel;
use strict;

my $filename = shift || "Book1.xls";
my $e = new Spreadsheet::ParseExcel;
my $eBook = $e->Parse($filename);
my $sheets = $eBook->{SheetCount};
my ($eSheet, $sheetName);

foreach my $sheet (0 .. $sheets - 1) {
    $eSheet = $eBook->{Worksheet}[$sheet];
    $sheetName = $eSheet->{Name};

    # Comment out the below line if you don't care about multiple worksheets
    print "#Worksheet $sheet: $sheetName\n";

    next unless (exists ($eSheet->{MaxRow}) and (exists ($eSheet->{MaxCol})));
    foreach my $row ($eSheet->{MinRow} .. $eSheet->{MaxRow}) {
        foreach my $column ($eSheet->{MinCol} .. $eSheet->{MaxCol}) {
            if (defined $eSheet->{Cells}[$row][$column])
            {
                print $eSheet->{Cells}[$row][$column]->Value . "\t";
            } else {
                print "\t";
            }
        }
        print "\n";
    }
}

It outputs tab-separated text to standard output, like:

# Worksheet 0:  Current members
column1  column2  column3
...
...
# worksheet 1:  worksheet
...
...
...

What is the environment, only windows or unix too.

Thanks all for all your input...the syetm is windows...I have below script which will accept data from xls and generate a xml data , but the problem is its also giving me column headers in the excel , i think i am doing some silly mistake in the code..can someplease please suggest me how i can exclude the columns headers appearing in the output :
below is the code :

 
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use XML::Writer;
use Time::Piece;
my $date = localtime->strftime('%Y-%m-%d');
my $date1 = localtime->strftime('%a, %d %b %Y %H:%M:%S +0000');
my @columns = ('ID', 'Name');
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse('myfile.xls') or die $parser->error();
my $worksheet = $workbook->worksheet(0);
my ( $row_min, $row_max ) = $worksheet->row_range();
my @data;
for my $row ( $row_min .. $row_max ) {
    my %hash;
    for my $col (0 .. $#columns) {
        my $cell = $worksheet->get_cell( $row, $col );
        $hash{$columns[$col]} = $cell->value();
    }
    push(@data,\%hash)
}
my $writer = XML::Writer->new(OUTPUT => 'self', DATA_MODE => 1);
$writer->xmlDecl("UTF-8");
$writer->startTag("BES", "xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance", "xsi:noNamespaceSchemaLocation" => "BES.xsd");
$writer->startTag('Baseline');
$writer->dataElement(Title => "new baseline");
$writer->dataElement(Description => "");
$writer->dataElement(Relevance => "true");
$writer->dataElement(Category=> "");
$writer->dataElement(Source => "Internal");
$writer->dataElement(SourceID => "");
$writer->dataElement(SourceReleaseDate => $date);
$writer->dataElement(SourceSeverity => "");
$writer->dataElement(CVENames => "");
$writer->dataElement(SANSID => "");
$writer->startTag('MIMEField');
$writer->dataElement(Name => "x-fixlet-modification-time");
$writer->dataElement(Value => $date1);
$writer->endTag('MIMEField');
$writer->dataElement(Domain => "BESC");
$writer->startTag('BaselineComponentCollection');
$writer->startTag('BaselineComponentGroup');
for my $row (@data) {
    $writer->startTag("BaselineComponent", Name => $row->{'Name'}, IncludeInRelevance => "true", SourceSiteURL => "http://sync.bigfix.com/cgi-bin/bfgather/bessecurity",SourceID => $row->{'ID'}, ActionName => "Action1");
 $writer->dataElement(ActionScript => "");
 $writer->dataElement(Relevance => "");
 $writer->endTag('BaselineComponent');
}
 $writer->endTag('BaselineComponentGroup');
 $writer->endTag('BaselineComponentCollection');
 $writer->endTag('Baseline');
 $writer->endTag('BES');
$writer->end;
print $writer->to_string;

below is the actual output :

 
<BES xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchema
Location="BES.xsd">
<Baseline>
<Title>new baseline</Title>
<Description></Description>
<Relevance>true</Relevance>
<Category></Category>
<Source>Internal</Source>
<SourceID></SourceID>
<SourceReleaseDate>2015-02-16</SourceReleaseDate>
<SourceSeverity></SourceSeverity>
<CVENames></CVENames>
<SANSID></SANSID>
<MIMEField>
<Name>x-fixlet-modification-time</Name>
<Value>Mon, 16 Feb 2015 13:56:59 +0000</Value>
</MIMEField>
<Domain>BESC</Domain>
<BaselineComponentCollection>
<BaselineComponentGroup>
<BaselineComponent Name="Name" IncludeInRelevance="true" SourceSiteURL="http://s
ync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceID="ID" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS15-010: Vulnerabilities in Windows Kernel-Mode Driver
 Could Allow Remote Code Execution - Windows Server 2008 R2 SP1 - KB3013455 (x64
)" IncludeInRelevance="true" SourceSiteURL="404 Not Found
her/bessecurity" SourceID="1501031" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS15-011: Vulnerability in Group Policy Could Allow Rem
ote Code Execution - Windows Server 2008 R2 SP1 - KB3000483 (x64)" IncludeInRele
vance="true" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity"
 SourceID="1501121" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS15-002: Vulnerability in Windows Telnet Service Could
 Allow Remote Code Execution - Windows Server 2003 SP2 - KB3020393" IncludeInRel
evance="true" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity
" SourceID="1500215" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS14-064: Vulnerabilities in Windows OLE Could Allow Re
mote Code Execution - Windows Server 2003 SP2 - KB3006226" IncludeInRelevance="t
rue" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceI
D="1406453" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
<BaselineComponent Name="MS14-022: Vulnerabilities in Microsoft SharePoint Serve
r Could Allow Remote Code Execution - SharePoint Server 2013 Client Components S
DK - KB2863854 (x64)" IncludeInRelevance="true" SourceSiteURL="http://sync.bigfi
x.com/cgi-bin/bfgather/bessecurity" SourceID="1402215" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>
</BaselineComponentGroup>
</BaselineComponentCollection>
</Baseline>
</BES>

as you can see while convwerting the xls which has two column headers - Name and ID. XML output is also including it as below :

<BaselineComponent Name="Name" IncludeInRelevance="true" SourceSiteURL="http://s
ync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceID="ID" ActionName="Action1">

i want the to be removed and script should only convert rest of the data excluding column headers.

Also it will be really helpful if one can let me know how i can directly pass the output in xml format in a separate file (for example : sample.bes) how to inlcude that logic in script itself or i can give a user to allow him to enter the desired file name..something like read statement in shell script ...

Thanks in advance.

Yes, you're looping through all the rows in the Perl program you wrote. If you do not want the header, don't loop through it.

1 Like

can someone please help...
@durden_tyler : could you show me how that can be done.. i want to exclude column headings but rest of the rows should be there..

can someone please provide some help to get the desired result... Thanks in advance. :slight_smile:

Try for my $row ( ($row_min+1) .. $row_max ) { to loop through one fewer row.

1 Like

Excellent...thanks a lot Corona688 .. i was trying withput the bracket around

$row_min+1

..thanks :slight_smile:

Thanks ... Now i have landed into a trouble where my script is taking blank values and trying to write it into a xml . I just want for loop to work only for nonempty rows or cells and it should complete once for loop find an empty cell or $row_max is reached..
can someone suggest what needs to be chnaged in the below script to do so :

 
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use XML::Writer;
use Time::Piece;
my $num_args = $#ARGV + 1; 
if ($num_args != 2) {     
print "\nUsage: baseline.pl Baseline_name xls_filename\n";     
exit; 
}   
my $Baseline_name=$ARGV[0]; 
my $xls_filename=$ARGV[1];
my $date = localtime->strftime('%Y-%m-%d');
my $date1 = localtime->strftime('%a, %d %b %Y %H:%M:%S +0000');
my @columns = ('ID', 'Name');
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($xls_filename) or die $parser->error();
my $worksheet = $workbook->worksheet(0);
my ( $row_min, $row_max ) = $worksheet->row_range();
my @data;
for my $row ( ($row_min+1) .. $row_max ) {
    my %hash;
    for my $col (0 .. $#columns) {
        my $cell = $worksheet->get_cell( $row, $col );
        $hash{$columns[$col]} = $cell->value();
    }
    push(@data,\%hash)
}
my $writer = XML::Writer->new(OUTPUT => 'self', DATA_MODE => 1);
$writer->xmlDecl("UTF-8");
$writer->startTag("BES", "xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance", "xsi:noNamespaceSchemaLocation" => "BES.xsd");
$writer->startTag('Baseline');
$writer->dataElement(Title => $Baseline_name);
$writer->dataElement(Description => "");
$writer->dataElement(Relevance => "true");
$writer->dataElement(Category=> "");
$writer->dataElement(Source => "Internal");
$writer->dataElement(SourceID => "");
$writer->dataElement(SourceReleaseDate => $date);
$writer->dataElement(SourceSeverity => "");
$writer->dataElement(CVENames => "");
$writer->dataElement(SANSID => "");
$writer->startTag('MIMEField');
$writer->dataElement(Name => "x-fixlet-modification-time");
$writer->dataElement(Value => $date1);
$writer->endTag('MIMEField');
$writer->dataElement(Domain => "BESC");
$writer->startTag('BaselineComponentCollection');
$writer->startTag('BaselineComponentGroup');
for my $row (@data) {
    $writer->startTag("BaselineComponent", Name => $row->{'Name'}, IncludeInRelevance => "true", SourceSiteURL => "http://sync.bigfix.com/cgi-bin/bfgather/bessecurity",SourceID => $row->{'ID'}, ActionName => "Action1");
 $writer->dataElement(ActionScript => "");
 $writer->dataElement(Relevance => "");
 $writer->endTag('BaselineComponent');
}
 $writer->endTag('BaselineComponentGroup');
 $writer->endTag('BaselineComponentCollection');
 $writer->endTag('Baseline');
 $writer->endTag('BES');
$writer->end;
print $writer->to_string;

here input is .xls file which has two columns - ID and Name. for loop is trying to write the data into xml even for empty rows as well as below :

 
</BaselineComponent>
<BaselineComponent Name="" IncludeInRelevance="true" SourceSiteURL="http://sync.bigfix.com/cgi-bin/bfgather/bessecurity" SourceID="" ActionName="Action1">
<ActionScript></ActionScript>
<Relevance></Relevance>
</BaselineComponent>

this shouldnt be the case.