remove lines based on score criteria

Hi guys,

Please guide for Solution.

PART-I

INPUT FILE (has 2 columns ID and score)
TC5584_1 93.9
DV161411_2 79.5
BP132435_5 46.8
EB682112_1 34.7
BP132435_4 29.5
TC13860_2 10.1

OUTPUT FILE (It shudn't contain the line ' BP132435_4 29.5 ' as BP132435 is repeated and it has lower score. If an ID is repeated more than twice, one with highest score should remain)
TC5584_1 93.9
DV161411_2 79.5
BP132435_5 46.8
EB682112_1 34.7
TC13860_2 10.1

PART-II

====FILE1======
TC5584_1 93.9
DV161411_2 79.5
BP132435_5 46.8
EB682112_1 34.7
TC13860_2 10.1

=====FILE2======
EB681299_3 129 269
EB425502_1 71 182
TC5584_1 66 188
BP132435_5 37 106
EB682112_1 22 150
BP132435_4 117 175
TC13860_2 16 93
DV161411_2 36 239

===OUTPUT_FILE===== (It contains column1 from FILE1 and its corresponding row from FILE2)
TC5584_1 66 188
DV161411_2 36 239
BP132435_5 37 106
EB682112_1 22 150
TC13860_2 16 93

your help is highly appreciated.

Thanks in advance. :slight_smile:

Try:
Part 1:

for each in $(awk -F"_" '{ print $1; }' input_file | sort -u); do sort -k1,1 -k2n input_file | grep $each | tail -1 >>output; done

For Part 2:

for each in $(awk -F"_" '{ print $1; }' file1 | sort -u); do sort -k1,1 -k2n file2 | grep $each | tail -1 >>output; done

I tried the code but its giving an error saying -
Illegal variable name. :frowning: