minimum and maximum from columns

jacobs.smith · September 11, 2012, 3:49pm

Hi Friends,

my input file is this way

chr1 100 120 abc
chr1 100 121 def
chr1 100 122 ghi
chr2 240 263 kil
chr2 240 276 ghj
chr2 255 290 hjh

my output

chr1 100 122 abc
chr2 240 276 kil
chr2 255 290 hjh

Basically, I want to match on first and second column and then print the unique first and second columns followed by the maximum of third column and first record from the fourth column.

Thanks

Don_Cragun · September 11, 2012, 6:39pm

Will all of the entries in the file with matching 1st and 2nd columns be on contiguous lines in the file?

jacobs.smith · September 11, 2012, 8:10pm

Yes, they are all sorted and are contiguous.

complex.invoke · September 11, 2012, 8:20pm

awk '{if(a[$1FS$2]<$3)a[$1FS$2]=$3;if(!key[$1FS$2])key[$1FS$2]=$NF}END{for(i in a)print i,a,key}' infile

Don_Cragun · September 11, 2012, 8:33pm

The following shell script should do what you want:

#!/bin/ksh
last1=
printlast() {
        if [ "x$last1" != x ]
        then
                printf "%s %s %s %s\n" "$last1" "$last2" "$last3" "$last4"
                last1=
        fi
}
while read in1 in2 in3 in4
do
        if [[ "x$in1" != "x$last1" || "x$in2" != "x$last2" ]]
        then    printlast
                last1="$in1"
                last2="$in2"
                last3="$in3"
                last4="$in4"
        else    if [ "$in3" -gt "$last3" ]
                then    last3="$in3"
                fi
        fi
done < in
printlast

There are lots of other ways to do this as well.

---------- Post updated at 05:33 PM ---------- Previous update was at 05:22 PM ----------

This is one of the other ways to do it I mentioned in my last message. Note, however, that the order of the output lines is not guaranteed to be the same as in the input file when using awk this way. If maintaining the order is important, you would have to sort the output from this awk script or make the awk script more complex.