Data formatting in CSV file to EXCEL

etldev · July 22, 2014, 3:50pm

Hello friends

I want to convert an csv file on unix (which is generated by a ETL application) to a formatted excel sheet like .I have roughly like 28 columns

1)All numbers need to be stored as numbers with leading zeros-like format as text for this column to preserve leading zeroes e.g '0012'
2)All Amounts should be formatted as numbers with 2 decimals and separator e.g '123.45'
3)All dates to be formatted as MM/DD/YYYY

My users are totally business and they want this unix csv file totally formatted so that they can use it readily..

I have read forums where people suggested using perl script ..Any initial help to start with is greatly appreciated.

Corona688 · July 22, 2014, 4:34pm

Excel will mangle your formatting and replace it with whatever it pleases, especially with dates.

etldev · July 22, 2014, 5:01pm

even if its not for the dates..I just want to format the text and numbers.

Chubler_XL · July 22, 2014, 5:59pm

Why not format up your sheet in excel how you want it then save as "XML Spreadsheet 2003".

You can then edit this file and cut out the unnecessary stuff like (<DocumentProperties> <OfficeDocumentSettings> etc.), just play around by editing the file and checking it can still be loaded by excel.

As an example this is a nice 2 column sheet with leading zero format on the first column, it's a fairly simple exercise to then transform a flat csv file into this type of .xml document.

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
  </Style>
  <Style ss:ID="s63">
   <NumberFormat ss:Format="0000"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="etldev_demo">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="1" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
   <Row>
    <Cell ss:StyleID="s63"><Data ss:Type="Number">12</Data></Cell>
    <Cell><Data ss:Type="String">test text</Data></Cell>
   </Row>
  </Table>
 </Worksheet>
</Workbook>

Or course there are perl CPAN modules that can assist in producing this output (Spreadsheet-WriteExcelXML comes to mind), and you may find dealing with these easier.

etldev · July 22, 2014, 6:07pm

I need this done on a daily basis like an automated things so hence I cant format it manually..

second thing is I am looking for a shell script/perl script which will take the csv on unix and give me a formatted xls with all the requirements I specified..

Chubler_XL · July 22, 2014, 6:37pm

Well what I'm thinking of is a template file where an awk program can then merge with to produce your document.

Imaging if the document above was split up into something like:

--HEADER--
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
  </Style>
  <Style ss:ID="s63">
   <NumberFormat ss:Format="0000"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="etldev_demo">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="1" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
--ROW--
   <Row>
    <Cell ss:StyleID="s63"><Data ss:Type="Number">%%FIELD#01%%</Data></Cell>
    <Cell><Data ss:Type="String">%%FIELD#02%%</Data></Cell>
   </Row>
--FOOTER--
  </Table>
 </Worksheet>
</Workbook>

Then an awk program could output the HEADER block, and for each csv line replace all the %%FIELD#nn%% values in the ROW block and finally output the FOOTER block.

---------- Post updated at 08:37 AM ---------- Previous update was at 08:13 AM ----------

The code to produce your file could be as simple as this:

awk -F, '
FNR==NR&&/^--/ {section++; next}
FNR==NR{block[section]=(block[section]?block[section]"\n":"") $0;next}
FNR==1{print block[1]}
{ out=block[2]
  for(i=1;i<=NF;i++) gsub(sprintf("%%%%FIELD#%02d%%%%",i),$i,out);
  print out
}
END{print block[3]}' template data.csv

However, if your csv has quoted fields with possible escaped characters and the like, you will need something a little smarter, like a proper csv parser.

Fortunately I have already posted one in this forum and we could use most of the code from that:

busi386 · July 23, 2014, 10:36am

Hi, can we see a sample of what this cvs data looks like?