I have a file test.txt with the lines below :
$ cat test.txt
AAA 1
AAA 2
BBB 5
BBB 7
BBB 9
CCC 3
CCC 4
DDD 6
EEE 5
I want to filter the file above to make it have unique rows with the condition that if there are rows with the same value in the first column I want the row with the biggest value in the second column to be displayed.
e.g. There are three rows which has "BBB" in the first column but I want the row with the biggest number in the second column "BBB 9" to be displayed.
My desired result after filtering is the below :
AAA 2
BBB 9
CCC 4
DDD 6
EEE 5
I have created a ksh script below which gets the above result but when there are many rows it takes very long to process so I would like to get some help in getting the same result with shorter process time.
#!/bin/ksh
INPUT=$1
sort $INPUT > $$ && mv $$ $INPUT
cut -d' ' -f1 $INPUT | sort | uniq -d > ROWS_WITH_MULTIPLE_VALUES
cut -d' ' -f1 $INPUT | sort | uniq -u > ROWS_WITH_SINGLE_VALUES
if [[ `cat ROWS_WITH_MULTIPLE_VALUES | wc -l` -gt 0 ]] ;
then
while read VALUE_1
do
grep $VALUE_1 $INPUT | tail -1
done < ROWS_WITH_MULTIPLE_VALUES
fi;
if [[ `cat ROWS_WITH_SINGLE_VALUES | wc -l` -gt 0 ]] ;
then
while read VALUE_1
do
grep $VALUE_1 $INPUT
done < ROWS_WITH_SINGLE_VALUES
fi;
rm ROWS_WITH_MULTIPLE_VALUES ROWS_WITH_SINGLE_VALUES
Any help will be greatly appreciated.
Cheers
Steve