What I would like to obtain is a file like this, in which for each replicates ID (column 1), I would like to report only those with the biggest lenght(and the relative coordinates)
Hi, it is not clear to me what you are looking for and also, how do you get from the input file that you specified to for example ENSMUSG00000000003 and chrX ?
What I am looking for is for each ID (first column) calculate the difference between the column 3 and 4 and keep only the lane in which the difference is bigger.
Thank you again and if you have further question do not hesitate to post a reply!
Giuliano
---------- Post updated at 03:06 PM ---------- Previous update was at 03:03 PM ----------
o
Hi
I am so sorry!!! I was in a hurry before and I did not check my message.
What I am looking for is for each ID (first column) calculate the difference between the column 3 and 4 and keep only the lane in which the difference is bigger.
Thank you again and if you have further question do not hesitate to post a reply!
awk '{diff=$4-$3; diff=diff >= 0 ? diff:-diff; if (diff > diffs[$1]) {diffs[$1]=diff;lines[$1]=$0}} END {for (i in lines) {print lines}}' file
If field 4 is always going to be greater than field 3 then you can shorten it a bit by not bothering to calculate the absolute value. Also, it's not guaranteed to preserve the ordering of the records in the file.