f_o_555
November 28, 2011, 6:00am
1
Hi again,
I need to further process the results of a previous manipulation.
I have a file with three columns
e.g.
AAA5 0.00175 1.97996e-06
AAA5 0.01334 2.14159e-05
AAA5 0.01340 4.12155e-05
AAA5 0.01496 1.10312e-05
AAA5 0.51401 0.0175308
BB0 0.00204 2.8825e-07
BB0 0.01569 7.94746e-07
BB0 0.01578 1.51949e-06
BB0 0.01766 3.91196e-07
BB0 2.18630 3.60312e-10
BB1 1.20490 6.06204e-09
CCC5 0.00226 5.31546e-10
What I would like to do is to sum the total of the 3rd column.
then, select (print) only those line whose ratio 3rd column/total is bigger than a certain value (stored in a variable).
I tried awk, but I still don't get the way it works.
Thanks in advance,
#! /usr/bin/perl -w
use strict;
my ($line, $sum, $crtn_value, $ratio) = (0, 0, 1e-4, 0);
my (@x);
open INPUT, "< input.txt";
for $line (<INPUT>) {
@x = split /\s+/, $line;
$sum = $sum + $x[2];
}
close INPUT;
open INPUT, "< input.txt";
for $line (<INPUT>) {
@x = split /\s+/, $line;
$ratio = $x[2] / $sum;
($ratio > $crtn_value) && print "$line";
}
close INPUT;
1 Like
f_o_555
November 28, 2011, 9:29am
3
Do you think that could this be done with AWK (maybe I gain some insight how it works)?
Try this...
awk '{sum+=$3;a[$0]=$3}END{for(i in a){if(a/sum > val){print i}}}' val="0.016" input_file
--ahamed
1 Like
f_o_555
November 28, 2011, 10:32am
5
Hi ahamed,
indeed it works thank you.
It is not clear in which order the results are printed.
Is there a way to print them in the same order as they appear in the input file?
sort the output if that helps i.e. awk '{...}' input_file | sort
or
Try this...
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a,arr);if(arr[3]/sum > val){print a}}}' val="0.016" input_file
--ahamed
1 Like
f_o_555
November 30, 2011, 10:33am
7
ahamed101:
sort the output if that helps i.e. awk '{...}' input_file | sort
or
Try this...
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a,arr);if(arr[3]/sum > val){print a}}}' val="0.016" input_file
--ahamed
Hi this solution works better than the previous, which didn't work all the time.
e.g.
file1
eu154 1.51000000 1.70513841e+10
eu154 1.52200000 2.07052522e+09
eu154 1.52200000 2.07052522e+09
eu154 1.53140000 2.07052522e+10
eu154 1.53780000 1.70513841e+11
eu154 1.55400000 4.87182404e+09
eu154 1.59300000 3.53207243e+12
eu154 1.59650000 6.27264499e+12
awk '{sum+=$3;a[$0]=$3}END{for(i in a){if(a/sum > val){print i}}}' val=0.0 file3
was giving out
eu154 1.53140000 2.07052522e+10
eu154 1.59300000 3.53207243e+12
eu154 1.52200000 2.07052522e+09
eu154 1.59650000 6.27264499e+12
eu154 1.55400000 4.87182404e+09
eu154 1.51000000 1.70513841e+10
eu154 1.53780000 1.70513841e+11
So it was missing one of the 2 identical lines
Still I have to sort the file with sort afterwards, but the command
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a,arr);if(arr[3]/sum > val){print a}}}' val="0.0" file3
gives
eu154 1.51000000 1.70513841e+10
eu154 1.52200000 2.07052522e+09
eu154 1.52200000 2.07052522e+09
eu154 1.53140000 2.07052522e+10
eu154 1.53780000 1.70513841e+11
eu154 1.55400000 4.87182404e+09
eu154 1.59300000 3.53207243e+12
eu154 1.59650000 6.27264499e+12
as expected