Hi Friends,
my input file is this way
chr1 100 120 abc
chr1 100 121 def
chr1 100 122 ghi
chr2 240 263 kil
chr2 240 276 ghj
chr2 255 290 hjh
my output
chr1 100 122 abc
chr2 240 276 kil
chr2 255 290 hjh
Basically, I want to match on first and second column and then print the unique first and second columns followed by the maximum of third column and first record from the fourth column.
Thanks
Will all of the entries in the file with matching 1st and 2nd columns be on contiguous lines in the file?
1 Like
Yes, they are all sorted and are contiguous.
awk '{if(a[$1FS$2]<$3)a[$1FS$2]=$3;if(!key[$1FS$2])key[$1FS$2]=$NF}END{for(i in a)print i,a,key}' infile
1 Like
The following shell script should do what you want:
#!/bin/ksh
last1=
printlast() {
if [ "x$last1" != x ]
then
printf "%s %s %s %s\n" "$last1" "$last2" "$last3" "$last4"
last1=
fi
}
while read in1 in2 in3 in4
do
if [[ "x$in1" != "x$last1" || "x$in2" != "x$last2" ]]
then printlast
last1="$in1"
last2="$in2"
last3="$in3"
last4="$in4"
else if [ "$in3" -gt "$last3" ]
then last3="$in3"
fi
fi
done < in
printlast
There are lots of other ways to do this as well.
---------- Post updated at 05:33 PM ---------- Previous update was at 05:22 PM ----------
This is one of the other ways to do it I mentioned in my last message. Note, however, that the order of the output lines is not guaranteed to be the same as in the input file when using awk this way. If maintaining the order is important, you would have to sort the output from this awk script or make the awk script more complex.
1 Like