AWK, Perl or Shell? Unique strings and their maximum values from 3 column data file

rich1 · February 22, 2012, 5:02am

I have a file containing data like so:

2012-01-02 GREEN 4
2012-01-02 GREEN 6
2012-01-02 GREEN 7
2012-01-02 BLUE 4
2012-01-02 BLUE 3
2012-01-02 GREEN 4
2012-01-02 RED 4
2012-01-02 RED 8
2012-01-02 GREEN 4
2012-01-02 YELLOW 5
2012-01-02 YELLOW 2

I can't always predict what the strings are going to be in the second column (so in the example above there are colours but the data file could contain any string in column two). There is always however a number in the third column (which I want the max value of for a paticular string in column two). Is awk able to:

Pull out each of the unique strings in column 2?
For each of the unique strings get the maximum associated value (so using the above you'd end up with the following)?:

2012-01-02 GREEN 7
2012-01-02 BLUE 4
2012-01-02 RED 8
2012-01-02 YELLOW 5

or would this be easier with Perl (or even shell)? any code examples much appreciated!

itkamaraj · February 22, 2012, 5:10am

$ sort -r -k2 -k3 input.txt | nawk '!a[$2]++'
2012-01-02 YELLOW 5
2012-01-02 RED 8
2012-01-02 GREEN 7
2012-01-02 BLUE 4

Franklin52 · February 22, 2012, 5:12am

Another approach:

awk '{a[$1 FS $2]=a[$1 FS $2] > $3?a[$1 FS $2]:$3}END{for(i in a)print i, a}' file

jayan_jay · February 22, 2012, 5:14am

Related post to your query : Only print the entries with the highest number?

balajesuri · February 22, 2012, 5:14am

perl -ane 'if($F[2] > $x{$F[1]}){$y{$F[1]}=$F[0]; $x{$F[1]}=$F[2]}; END{for(keys %x){print "$y{$_} $_ $x{$_}\n"}}' inputfile