Print line based on highest value of col (B) and repetion of values in col (A)

imahmoud · July 23, 2011, 8:54pm

Hello everyone,

I am writing a script to process data from the ATP world tour.

I have a file which contains:

t=540 y=2011 r=1 p=N409
t=540 y=2011 r=2 p=N409
t=540 y=2011 r=3 p=N409
t=540 y=2011 r=4 p=N409
t=520 y=2011 r=1 p=N409
t=520 y=2011 r=2 p=N409
t=520 y=2011 r=3 p=N409

The contents of the file will get updated regularly with different `t' values (first column) and `r' values (third column). After each update of the file, I want to be always able to print the line which contains: The highest value of `r' (third column) for the first-repeating value of `t' (first column).

So, in the above version of the file I want to print the 4th line:

t=540 y=2011 r=4 p=N409

But, for example if the file gets updated to:

t=560 y=2011 r=1 p=N409
t=560 y=2011 r=2 p=N409
t=560 y=2011 r=3 p=N409
t=560 y=2011 r=4 p=N409
t=560 y=2011 r=5 p=N409
t=560 y=2011 r=6 p=N409
t=540 y=2011 r=1 p=N409

Then, I will need to print the 6th line:

t=560 y=2011 r=6 p=N409

How can I find the line based on these criteria? Your help is greatly appreciated

agama · July 23, 2011, 9:08pm

Assuming that the input file is sorted, this should work:

awk '

    {
        if( last && last != $1 )     # we have a different first token
        {
            print saved;             # show the record with largest 3rd token
            exit( 0 );
        }

        last = $1;
        split( $3, a, "=" );
        if( a[2] > max )          # token 3 is greater than max seen
        {
            saved = $0;           # save this record
            max = a[2];
        }
    }
' input-file

imahmoud · July 23, 2011, 9:27pm

agama, I can not find words to thank you .. It worked like a charm

imahmoud · July 23, 2011, 9:37pm

Thank you so much, agama

shamrock · July 25, 2011, 12:04pm

yet another awk way...

awk -F'[=| ]' '{if(t[$2]<$6 && l[$2]=$0) t[$2]=$6}END{for(i in l) print l}' file