Filtering lines for column elements based on corresponding counts in another column

polsum · March 5, 2012, 9:21am

Hi,

I have a file like this

ACC 2 2 21 aaa 
AC 443 3 22 aaa  
GCT 76 1 33 xxx 
TCG 34 2 33 aaa 
ACGT 33 1 22  ggg 
TTC 99 3 44 wee 
CCA 33 2 33 ggg 
AAC 1 3 55 ddd 
TTG 10 1 22 ddd 
TTGC 98 3 22 ddd 
GCT 23 1 21 sds 
GTC 23 4 32 sds
ACGT 32 2 33 vvv 
CGT 11 2 33 eee 
CCC 87 2 44 eee

As you can see column5 has repetitive elements. I want to print the lines with highest column2 values for each repetitive element in column5.

If there are more than 1 maximum value in column2, print the line with first occurrence of column5 value with maximum column2 value

So, the desired output

AC 443 3 22 aaa  
GCT 76 1 33 xxx 
ACGT 33 1 22  ggg 
TTC 99 3 44 wee 
TTGC 98 3 22 ddd 
GCT 23 1 21 sds 
ACGT 32 2 33 vvv 
CCC 87 2 44 eee

thanks in advance:)

bartus11 · March 5, 2012, 9:28am

Try:

awk '$2>M[$5]{M[$5]=$2;a[$5]=$0}END{for (i in a) print a}' file