Select lines in which column have value greater than some percent of total file lines

vaibhavkorde · April 21, 2011, 3:55am

i have a file in following format

 1    32    3    
 4    6    4
 4    45    1    
 45    4    61    
 54    66    4    
 5    65    51
 56    65    1
 12  32  85

now here the total number of lines are 8(they vary each time)

Now i want to select only those lines in which the values in third column are having the value greater than 25% of the length .
i.e. the value in third column shud be greater than 2 here

so that selected lines will be

1    32    3    
4    6    4
45    4    61    
54    66    4    
5    65    51
12  32  85

i can use awk like

awk '{if(some condition) printf $1"\t"$2"\t"$3"\n"}'

but not getting how to get this condition exactly

Please help me out
Thanks in advance

zaxxon · April 21, 2011, 4:11am

You did not specify which value represents the length so I just test > 2 as you said.

awk '$3 > 2 {printf("%-5s%-5s%-5s\n", $1, $2, $3)}' infile
1    32   3
4    6    4
45   4    61
54   66   4
5    65   51
12   32   85

vaibhavkorde · April 21, 2011, 4:13am

Actually the total number of lines present in the file will be the length
and we need to choose the column in some percents of it

sk1418 · April 21, 2011, 4:19am

kent$ echo " 1    32    3    
dquote>  4    6    4
dquote>  4    45    1    
dquote>  45    4    61    
dquote>  54    66    4    
dquote>  5    65    51
dquote>  56    65    1
dquote>  12  32  85" |awk '{a[NR]=$3;b[NR]=$0} END{x=NR/4; for(i=1;i<=NR;i++)if (a>x)print b}'
 1    32    3    
 4    6    4
 45    4    61    
 54    66    4    
 5    65    51
 12  32  85

vaibhavkorde · April 21, 2011, 4:38am

no it wont work for every file
coz the length is nothing but number of lines in file

sk1418 · April 21, 2011, 4:40am

have you tried my command?
i used NR, at the end, the NR indicates: how many lines in your file.

if you apply the command on a different file, it works as well, for example:

 echo "1    32    3    
4    6    4
45    4    61
54    66    4
5    65    51
12  32  85
1    32    3
4    6    4
45    4    61
54    66    4
5    65    51
12  32  85
1    32    3
4    6    4
45    4    61
54    66    4
5    65    51
12  32  85"|awk '{a[NR]=$3;b[NR]=$0} END{x=NR/4; for(i=1;i<=NR;i++)if (a>x)print b}'
45    4    61    
5    65    51
12  32  85
45    4    61    
5    65    51
12  32  85
45    4    61    
5    65    51
12  32  85

vaibhavkorde · April 21, 2011, 4:42am

yeah it worked fine
thank you