Output minimum and maximum values for replicates ID

giuliangiuseppe · July 2, 2014, 9:33am

Hi All
I hope that someone could help me!
I have an input file like this, with 4 colum(ID, feature1, start, end):

a x 1 5
b x 3 10
b x 4 9
b x 5 16
c x 5 9
c x 4 8

And my output file should be like this:

a x 1 5
b x 3 16
c x 4 9

What I would like to do is to output for each ID the smallest start coordinate (column 3) and the largest end coordinate (column4).

Thank you!

vgersh99 · July 2, 2014, 10:46am

awk '{idx=$1 FS $2}FNR==1{a3[idx]=$3}{a3[idx]=(a3[idx]>$3)?a3[idx]:$3;a4[idx]=($4>a4[idx])?$4:a4[idx]} END{for(i in a3)print i,a3,a4}' myFile

Yoda · July 2, 2014, 10:57am

awk '
        {
                F[$1 FS $2] = F[$1 FS $2] ? F[$1 FS $2] : $3
                F[$1 FS $2] = F[$1 FS $2] > $3 ? $3 : F[$1 FS $2]
                S[$1 FS $2] = S[$1 FS $2] < $4 ? $4 : S[$1 FS $2]
        }
        END {
                for ( k in F )
                        print k, F[k], S[k]
        }
' file