I have more than 1000 files to parse. Each file contains few lines (number of lines varies) followed by a header line having all column's name (SPOT, NAME etc) and then values for those columns.
**Example File:
1) I need to remove all lines before the header line (for each file).
2) I need to extract 2 columns, let say 2nd (NAME) & 5th column (CHDN_MED). Since there is no values for 2nd & 4th rows for 5th column, it should give a blank space for each in output file. But its giving 6.3 and 6.7 respectively from the 6th column (I am tring with AWK).
The desired output file is :
IYPR1 3
IYPR9
IYPR11 4
IYPR13
IYPR19 7
. .
. .
**the actual files are with .xls extension and each file has more than 50 columns and 9,000 rows.
This script handles one of your files. You may change the rows to print out. (here: 2 and 5 -> RTP="2 5")
I assume that the number of rows is usually six and only row 5 can be empty. If not we need to enhance the case-structure.
You need to run the script with the filename as first parameter:
./script.sh file.txt
#! /bin/bash
RTP="2 5"
while read -a A ; do
case ${#A[@]} in
5) A[5]=${A[4]} ; A[4]="" ;;
*) : ;;
esac
L=""
for n in ${RTP} ; do
L="${L} ${A[$((N=n-1))]}"
done
echo "${L}"
done < <(awk '/SPOT/,/EOF/ {print}' ${1})
thanks guys for quick replies. @Franklin52 : yeah, all Input files are excel files. Its (script provided by you) working fine with " .txt " file but not with excel file, I afraid.
@elbrand : any row value for any column can be empty.
---------- Post updated at 12:44 PM ---------- Previous update was at 12:42 PM ----------
thanks guys for quick replies. @Franklin52 : yeah, all Input files are excel files. Its (script provided by you) working fine with " .txt " file but not with excel file, I afraid.
@elbrand : any row value for any column can be empty.
I simply created a artificial ".txt" file with same content what I have given in my 1st post. your script is working very fine on it. But when I convert ".xls" file into ".txt "or ".csv" file, its not working. Again its taking values from other columns. This is what i did :
mv 10042.xls 10042.csv
OR
mv 10042.xls 10042.txt
Also, above command is for one file. I need it in a way so that it can be run on all files.
Guys, I am still in a real fix. All I want is to extract column from excel files as it is.
If there are blanks in the column, they should be in the output.
If you want to read excel sheet and print the value of cell, you can try below perl solution (read_excel.pl).
input file - myxl.xls
field 1 field 2 field 3
test1 1 6
test2 2
test3 3 7
test4 4
test5 5 8
#!/usr/bin/perl
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
my $sourcename = shift @ARGV;
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename) or die "Could not open source Excel file $sourcename: $!";
my $storage_book;
foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1)
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
print "--------- SHEET:", $source_sheet->{Name}, "\n";
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (1 .. $source_sheet->{MaxRow})
{
foreach my $col_index ($source_sheet->{MinCol} .. $source_sheet->{MaxCol})
{
if ($col_index == 0 or $col_index == 2 ) {
my $source_cell = $source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell)
{
#print "( $row_index , $col_index ) =>", $source_cell->Value, "\n";
print $source_cell->Value, "\t";
} # end of source_cell check
}
} # foreach col_index
print "\n";
} # foreach row_index
} # foreach source_sheet_number