I have an excel sheet with 11 tabs. I need to take data from the first tab and write the output to the second tab. The first tab looks like this, starting from Row 3
The filters that needs to be created are
1) keep anything greater than 'POS' 5 and less than 160 AND
2) From an external csv file, read the file with headers CHROM/POS/REF/ALT/SCORE and create a new column on the this sheet called SCORE to write only the ones with score 0 or 1 or 2 (out of 6) matching CHROM/POS/REF/ALT/
3) Keep the ones with 'percent' more than 5% for those with score 0
I have written several one liners in awk for these filters with a lot of intermediate files and also converting excel sheets to csv. Would Python be a better way ? Thank you
There's an excellent Perl library for .xls spreadsheets (not xlsx afaik), SpreadSheet::ParseExcel. It can import, edit, and export. It's the usual thing used when UNIX scripts are forced to speak Microsoft. As a very rough cut to start with, perhaps:
#!/usr/bin/perl
use Spreadsheet::ParseExcel;
my $infile=shift || die("No input filename given");
my $outfile=shift || die("No output filename given");
my $e=new Spreadsheet::ParseExcel;
my $eBook = $e->Parse($filein);
my $sheet1=$eBook->{Worksheet}[0];
my $sheet2=$eBook->{Worksheet}[1];
my $row=0;
foreach my $row ($sheet1->{MinRow} .. $sheet1->{MaxRow}) {
$sheet2->write_row(0, $row++, ["value1", "value2", "value3"]);
}
$eBook->SaveAs($outfile);